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Abstract 

High-Speed  Rapid-Single-Flux-Quantum  Multiplexer  and  Demultiplexer  Design  and  Testing 

by 

Lizhen  Zheng 

Doctor  of  Philosophy  in  Engineering-Electrical  Engineering  and  Computer  Sciences 

University  of  California,  Berkeley 
Professor  Theodore  Van  Duzer,  Chair 

Superconductor  electronics  excel  for  high  operation  speed  and  low  power  consumption  (sev¬ 
eral  orders  of  magnitude  lower  than  the  equivalent  semiconductor  circuits).  Rapid-Single-Flux- 
Quantum  (RSFQ)  circuits,  in  which  information  is  stored  in  superconductor  loops  as  tiny  magnetic 
flux  quanta  and  transferred  as  several  picosecond-wide  voltage  pulses  with  quantized  area 
(jV(r)dr  =  ~  =  2.07 mV ■  ps),  are  demonstrated  to  work  at  a  few  tens  of  gigahertz  with  the  current 
niobium  process  and  has  the  potential  to  work  up  to  a  few  hundred  gigahertz  with  technology  scal¬ 
ing.  A  large  superconductor  RSFQ  system  or  a  hybrid  system  combined  with  the  low-power  high- 
density  cryogenic  CMOS  memory  can  be  realized  with  a  multi-chip  module  (MCM)  packaging 
technique. 

The  goal  of  this  thesis  project  is  to  design  and  to  experimentally  demonstrate  20-50  GHz  oper¬ 
ation  of  a  1:8  demultiplexer  (DEMUX)  and  an  8:1  multiplexer  (MUX).  DEMUX  and  MUX  are 
important  interface  circuits  that  are  required  to  take  advantage  of  the  ultra-high  speed  of  the  RSFQ 
logic.  They  are  required  to  interface  the  superconductor  and  the  lower-speed  semiconductor  cir¬ 
cuits  in  a  hybrid  system.  In  a  superconducting  MCM  system,  the  DEMUX  and  MUX  can  be  used 
to  convert  the  data  rate  between  chips. 

The  speed  of  RSFQ  circuits  scales  with  the  process  technology.  An  analysis  is  done  to  show 
that  the  maximum  speed  of  RSFQ  circuits  is  proportional  to  the  shunted  Josephson  junction’s  crit¬ 
ical  current  times  its  shunt  resistance  (ICR)  value.  Furthermore,  ICR  is  proportional  to  the  square 
root  of  the  junction's  critical  current  density  (Jc  )  in  the  low-Tc  niobium  process.  Superconductor 
integrated  circuits  using  a  1  kA/cm-,  3.5  pm  niobium  fabrication  technology  can  operate  up  to  30- 
40  GHz.  Simulations  reveal  that  simple  RSFQ  elements  and  gates  based  on  a  6.5  kA/cm“  technol- 
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ogy  can  operate  up  to  70-100  GHz.  With  typical  circuit  parameters,  the  minimum  features  are 
around  1.35  pm.  Combining  the  possible  larger  process  variations  caused  by  the  reduced  feature 
size  and  thinner  junction  barrier  layer,  operation  of  DEMUX  and  MUX  circuits  at  50  GHz  is  taken 
as  a  reasonable  and  challenging  design  goal. 

20  GHz  multiplexers  (8:1,  4:1  and  2:1)  and  20  GHz  demultiplexers  (1:8,  1:4  and  1:2)  were 
designed  and  fabricated  using  the  1  kA/cm  process.  With  the  external  test  equipment,  the  correct 
functioning  of  a  1:4  DEMUX  was  observed  up  to  9.2  GHz.  3.5  GHz  testing  result  has  been 
achieved  for  a  2:1  MUX.  When  the  designs  were  migrated  to  50  GHz  using  a  6.5  kA/cm"  process, 
all  the  circuit  components  were  re-optimized  for  the  new  process  and  higher  operation  speed.  A 
few  specialized  optimization  tools  were  used  to  maximize  the  circuit  parameter  margins  and 
yields.  It  was  found  that  it  is  necessary  to  do  post-layout  re -optimization  including  parasitic  induc¬ 
tances.  Monte  Carlo  analyses  based  on  process  variations  were  performed  to  predict  the  circuit 
yield  and  timing  variations. 

When  the  clock  speed  is  above  20  GHz,  RSFQ  circuit  verifications  using  the  external  test 
equipment  are  not  feasible  due  to  the  unavailability  of  room  temperature  test  equipment  and  heavy 
dispersion  along  the  cables.  A  data-driven-self-timed  (DDST)  on-chip  test  system  was  re-designed 
and  optimized  at  50  GHz  assuming  a  6.5  kA/cm"  process. 

The  50  GHz  2-bit  DEMUX,  basic  cells  of  the  MUX  and  the  high-speed  test  system  layouts 
were  fabricated  in  the  UCB  6.5  kA/cm"  process.  But  due  to  an  irreparable  failure  of  the  fabrication 
process,  the  chips  could  not  be  verified  by  testing. 


Professor  Theodore  Van  Duzer,  Chair 


Elizabeth,  Andrew  and  my  parents 
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CHAPTER  1 

An  Overview  of  Rapid-Single- 
Flux-Quantum  Logic  and 

Circuits 


1.1  Introduction 

Superconductor  devices  and  electronics  have  their  unique  high  performances  and  find  their 
niche  applications  where  traditional  semiconductor  electronics  can  not  provide  the  needed  perfor¬ 
mance  [1]  [2]. 

The  main  advantages  of  superconductor  circuits  include: 

1.  High  operation  speed  combined  with  low  power  consumption.  Rapid-Single-Flux-Quantum 
(RSFQ)  circuits  in  the  current  technology  can  work  at  a  few  tens  of  gigahertz  with  the  potential  to 
operate  above  100  GHz  with  scaled  device  size  [3]  [4] .  A  basic  T  flip-flop  was  demonstrated  at  750 
GHz  with  0.5  pm  feature  size.  And  the  power  consumption  of  superconductor  circuits  is  a  few 
orders  lower  than  that  of  the  semiconductor  circuits.  The  switching  energy  of  a  typical  200  pA 
junction  is  4  x  10“19  J.  A  rich  library  of  basic  cells  such  as  flip-flops,  buffers,  adders,  multipliers, 
clock  generator  circuits,  and  phase-locking  circuits  have  been  developed.  Superconductor  technol¬ 
ogy  finds  applications  in  ultra-fast  digital  signal  processing  (DSP)  circuits,  network  switching  and 
supercomputing.  A  20  GHz  microprocessor  based  on  the  4  kA/cm-,  1.75  pm  low-Tc  niobium  pro- 
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cess,  including  25,000  Josephson  junctions  on  a  5  mm  x  5  mm  chip  was  designed  as  part  of  the 
Hybrid-Technology-Multi-Threaded  (HTMT)  project  aiming  at  1015  floating  point  operations  per 
second  [5].  A  multi-gigabit  network  switch  was  demonstrated  in  a  hybrid  system  including  photo 
detectors  [6].  Recent  switch  circuit  components  are  demonstrated  at  a  few  tens  of  gigahertz  [7]. 

2.  Low  noise  and  low  pulse  dispersion.  Lossless  ultra-high  Q  passive  superconductor  micro- 
wave  filters  offer  unmatched  sharpness,  low  noise  figure,  and  interference  rejection  in  cellular 
base  station  RF  receivers  [8]. 

3.  The  superconducting  quantum  interference  device  (SQUID)  based  sensor  can  detect  a  single 
flux  quantum  (<h0  =  2.07  x  10“15  Wb).  This  high  sensitivity  is  applied  in  the  superconductor  mag¬ 
netoencephalography  (MEG)  systems  for  imaging  the  human  brain.  It  also  provides  high  sensitiv¬ 
ity  and  linearity  to  the  superconductor  analog-to-digital  converter  (ADC).  And  recently,  the  RSFQ 
superconductor  ADC  technology  has  been  envisioned  as  an  enabling  technology  for  software 
defined  radio  (SDR).  In  SDR  receivers,  ADCs  digitize  RF  signals  directly  from  the  antenna  with 
sufficient  resolution.  All  the  following  signal  processing  can  be  implemented  in  the  digital 
domain.  The  tens  of  gigahertz  operation  of  RSFQ  DSP  circuits  enable  high  speed  digital  down- 
conversion.  With  such  a  prospect,  a  set  of  ADCs  could  cover  the  spectrum  from  dc  to  a  few  giga¬ 
hertz,  each  providing  more  than  100  dB  of  SFDR  in  its  own  band  [9] [10]. 

However,  superconductor  integrated  circuits  need  to  operate  under  special  conditions.  First, 
low-Tc  superconductor  (LTS)  circuits  operate  at  a  few  degrees  Kelvin  with  a  cryocooler  or 
immersed  in  liquid  helium.  High-Tc  superconductor  (HTS)  circuits  operate  at  a  few  tens  of  degrees 
Kelvin  with  a  cryocooler  or  immersed  in  liquid  nitrogen.  Another  difficulty  in  using  superconduc¬ 
tor  ICs  is  flux  trapping.  The  earth's  field  is  about  500  mG.  Magnetic  shielding  to  reduce  the  ambi- 


Chapter  1:  An  Overview  of  Rapid-Single-Flux-Quantum  Logic  and  Circuits 


3 


ent  field  to  less  than  10  (J.G  is  desired.  Even  with  that  and  special  layout  precaution,  the  power 
supply  currents  and  the  signal  noise  in  the  circuit  may  still  trigger  flux  trapping. 


1.2  Device  and  Physics 

1.2.1  Josephson  Junction 

The  active  device  in  superconductor  electronics  is  the  Josephson  junction,  a  two-terminal 
device  which  is  an  electrically  weak  contact  between  two  superconductor  electrodes.  In  1962,  B.D. 
Josephson  predicted  that  it  should  be  possible  for  electron  pairs  to  tunnel  between  closely  spaced 
superconductors  even  without  a  potential  difference  [11].  Anderson  and  Rowell  made  an  observa¬ 
tion  of  the  Josephson  effect  in  1964  [12]. 


There  are  numerous  ways  to  form  Josephson  junctions.  At  present,  the  most  common  practice 
in  low-temperature  superconductor  (LTS)  electronics  is  using  a  niobium-trilayer  (Nb/A10x/Nb) 
structure  as  shown  in  Fig.  1.1a.  The  top  and  bottom  layers  are  niobium,  which  is  a  superconductor 
below  9.2  K.  In  the  middle  is  a  thin  layer  insulator  of  A10x,  which  is  about  1  nm  thick.  The  barrier 
is  thin  enough  for  the  electron  pair  wave  functions  of  the  two  superconductors  to  couple  with  each 
other,  so  that  the  electron  pairs  can  tunnel  from  one  superconductor  electrode  to  the  other  super- 


Figure  1.1  SIS  Josephson  junction,  (a)  The  physical  structure,  (b)  The  circuit  symbol. 
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conductor  electrode  even  with  zero  voltage  applied  on  the  junction.  Such  a  Josephson  junction  is 
also  called  an  SIS  tunnel  junction.  Fig.  1.1b  shows  the  circuit  symbol  of  a  Josephson  junction. 

A  simple  quantum-mechanical  derivation  [13]  gives  the  Josephson  relations,  which  can  be 
expressed  in  two  equations: 


7  =  7csin(|)  (U) 

where  the  constant  Ic  is  the  critical  current  of  the  Josephson  junction  and  <|)  is  the  phase  difference 
of  the  pair  wave  functions  in  the  two  superconductors.  I  is  the  pair  current  tunneling  through  the 
junction. 


3(j)  _  2e  _  2n 
dt  ~  h  -  <Fq 


(1.2) 


where  t  is  time,  e  is  electron  charge,  ti  is  the  Plank’s  constant,  and  V  is  the  voltage  across  the 
junction.  <1>q  =  h/2e  =  2.0679x  10  ^Wb  is  a  flux  quantum. 

As  we  can  see  from  the  above  two  equations,  with  zero  applied  voltage,  the  phase  difference  (f) 
remains  constant.  And  a  pair  current  less  than  Ic  can  tunnel  through  the  junction.  This  is  called  the 
dc  Josephson  effect. 


It  can  be  inferred  from  Eq.  (1.1)  and  (1.2)  that  the  coupling  of  the  wave  functions  reduces  the 
system  energy  by  an  amount  (for  small  junctions) 


Ec  =  (7r/^/2e)  cos  (|)  (1.3) 

When  <j)  =  0,  the  current  is  zero  and  the  coupling  energy  has  its  maximum  value.  When  <f) 
approaches  rt/2,  the  tunneling  current  reaches  its  maximum  Ic,  and  the  coupling  energy  is  reduced 
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I 


Figure  1 .2  The  RSJ  circuit  model  of  a  Josephson  tunnel  junction  after  Fig. 

4.09a  in  [1], 

to  zero.  For  higher  currents,  the  wavefunctions  will  be  uncoupled;  voltage  appears  across  the  junc¬ 
tion  and  varies  according  to  Eq.  (1.2). 

The  Josephson  relations  above  describe  only  pair  current  in  the  Josephson  junction.  There  also 
exists  single -particle  tunneling  in  the  junction  when  a  potential  difference  is  applied.  A  well- 
accepted  so-called  RSJ  (Resistor  Shunted  Junction)  or  CRSJ  (Capacitor  Resistor  Shunted  Junc¬ 
tion)  equivalent  circuit  model  can  be  used  to  analyze  the  Josephson  junction  as  shown  in  Fig.  1.2. 
Pair  current  is  the  leftmost  branch  labeled  as  /csin(|).  Capacitance  C  is  used  to  model  the  displace¬ 
ment  current  flowing  between  the  two  superconductor  electrodes,  which  can  be  estimated  from  the 
parallel-plane  capacitance  formula;  C  =  (£q £fA)/d  ,  where  A  is  the  junction  area,  d  is  the  barrier 
thickness,  £f.  is  the  relative  permittivity  of  the  barrier  material.  For  the  actual  modeling,  the  capac¬ 
itance  is  obtained  experimentally.  One  published  result  [14]  is  shown  in  Fig.  1.3.  The  conductance 
element  G(V)  on  the  right  represents  the  quasiparticle  current  and  the  barrier  leakage  current.  Fig. 
1.4a  shows  a  typical  I-V  curve  for  a  tunnel  junction.  The  current  for  the  voltage  state  part  can  be 
approximated  as  a  piece-wise  linear  function  of  the  voltage.  The  conductance  G(V)  is  defined  as 
the  ratio  of  the  current  over  the  voltage  for  a  point  on  the  curve  as  shown  in  Fig.  1.4a.  For  voltage 
above  the  gap  voltage,  the  junction  has  a  conductance  G„  =  R„_1.  For  the  sub-gap  voltage,  the  con- 
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Figure  1 .3  Specific  capacitance  of  Nb/AIOx/Nb  Josephson  junctions  [1 4], 


_  V 


(a) 


G(V) 


(b) 


Figure  1.4  SIS  Josephson  junction  (a)  The  static  l-V characteristic  and  (b)  con¬ 
ductance  G(V). 
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ductance  Gsg  is  very  small.  Usually  we  use  a  quantity  Vm  =  /c/G(2mV)  to  measure  the  quality  of  a 
tunnel  junction.  Vm  >  40  mV  is  considered  good  for  the  critical  current  density  of  1  kA/cm“. 
Equivalently,  G(2mV)  is  about  15-25  times  lower  than  Gn. 

1.2.2  Static  I-V  Characteristics  of  Shunted  Josephson  Junctions 

In  this  section  we'll  study  the  I-V  characteristics  of  a  Josephson  junction  with  a  constant  con¬ 
ductance  G  and  driven  by  a  dc  current  source.  Through  the  analysis  below,  we  can  see  with  differ¬ 
ent  shunt  condition,  the  7-V  curve  can  be  changed  between  hysteretic  and  non-hysteretic  ones.  The 
latter  is  used  for  RSFQ  circuits. 

We  can  write  a  differential  equation  for  the  junction  equivalent  circuit  shown  in  Fig.  1.2,  with 
a  dc  current  source  /  and  a  constant  conductance  G. 

/  =  /  sincb +  GV+C^  (1.4) 

c  Y  dt 

If  we  use  the  Josephson  relation  Eq.  (1.2),  and  define  a  new  time  variable 

0  =  co^t  =  (2e/ii)(I  /G)t  (1.5) 

we  obtain 


I  „  d  <b  c/(b 
—  =  B  — 1  +  -±  +  sintb 
I  2  c/0  v 

c  c/e 


(1.6) 


where 


co  C 


(1.7) 
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is  the  McCumber  constant. 

Now  we  are  going  to  find  the  average  voltage  V  =  {(h/{2e)){d§/dt))  with  a  given  applied 
dc  current.  We  take  a  look  at  two  simplest  cases.  First,  when  C  =  0,  (3f  =  0,  Eq.  (1.6)  can  be  inte¬ 
grated  directly,  and  we  obtain 

V  =  0  for 

V=  (/c/G)[(///c)2-  1]1/2  for 

This  is  shown  in  Fig.  1.5a.  For  I  >  Ic.  It  shows  a  parabolic  dependence  of  V  on  I.  And  notice  that 
for  each  value  of  I,  there  is  an  unique  value  of  V  on  the  1-V  curve.  For  the  other  extreme  case,  (3C  = 
°° ,  the  1-V  curve  shows  a  linear  dependence  determined  by  the  conductance  G.  For  each  value  of  I 
<  Ic,  there  are  two  values  of  V on  the  1-V  curve.  It  shows  a  hysteretic  1-V  curve.  For  a  more  general 
case,  p  ^  0 ,  numerical  calculation  needs  to  be  carried  out  to  find  the  I-V relation.  Fig.  1.5b  shows 
a  normalized  I-V  characteristic  for  a  junction  with  p(.  =  4.  Study  shows  there  is  no  hysteresis  for 
case  Pf  <  1.  When  Pc  >  1,  the  hysteresis  starts  and  increases  with  the  increasing  Pc.  In  RSFQ  cir- 


/</ 


/>/ 


(1.8) 


!_ 

/c 


ic 


(a) 


(b) 


Figure  1.5  Normalized  I-V  characteristics  for  a  Josephson  junction  (a)  negligible  (Pc  = 
0)  and  dominating  (Pc  =  <*> )  capacitance,  and  (b)  pc  =  4. 
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cuits,  the  non-hysteretic  I-V  characteristic  is  necessary  for  the  circuit  operations.  So  junctions  with 
(3C  around  1  are  used  in  RSFQ  circuits.  Larger  damping  (3C  «1  would  slow  the  circuit. 

1.2.3  Driven-Pendulum  Analog 

A  driven-pendulum  analog  as  shown  in  Fig.  1.6  can  help  to  visualize  the  dynamics  of  the 

Josephson  junction.  Assuming  the  pendulum  arm  is  weightless  with  length  /  and  the  pendulum  bob 

2 

has  a  mass  m,  the  moment  of  inertia  of  the  pendulum  will  be  M  =  ml  .  The  motion  equation  gov¬ 
erning  the  angular  acceleration  of  the  pendulum  is: 

2  2 

T  =  Mel  §/clt  (1.9) 

where  §  is  the  angle  between  the  pendulum  arm  and  the  vertical  direction.  T  is  the  total  torque, 
which  consists  of  three  parts:  1)  the  applied  torque  Ta,  2)  the  torque  produced  by  the  gravitation  of 
the  pendulum  bulb,  -mghm§,  where  g  is  the  gravitational  acceleration;  3)  the  damping  torque,  -D 
dty/dt,  where  I)  is  a  damping  constant.  So 


Figure  1.6  Driven-pendulum  analog  for  the  Josephson  junction. 
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+  D^|  +  mg/sin<|) 
clt 


T 

a 


If  we  compare  this  with  Eq.  (1.6) 


(1.10) 


2 

hCd  <])  tiGd§ 

2e  ,  2  2e  r/i 
dt 


sincf) 


/ 


(1.11) 


we  can  see  that, 

1)  the  angle  (])  is  the  analog  of  the  phase  difference  (]); 

2)  the  angular  velocity  d§/dt  is  the  analog  of  the  voltage  V\ 

3)  the  moment  of  inertia  M  is  the  analog  of  the  capacitance  C; 

4)  the  damping  constant  D  is  the  analog  of  the  conductance  G; 

5)  the  maximum  of  the  gravitational  torque  mgl  is  the  analog  of  the  critical  current  /c; 

6)  the  applied  torque  7a  is  the  analog  of  the  source  current  I. 

So  for  a  resistively  shunted  junction  with  (3C  =  1  used  in  the  RSFQ  circuit,  we  can  see  how  the 
analog  helps  us  to  imagine  the  junction  switching  dynamics.  The  junction  is  biased  to  0.7/c,  with 
phase  close  to  45  degrees.  This  is  equivalent  to  the  analog  with  a  torque  applied  to  the  pendulum 
and  the  pendulum  bob  moved  away  from  the  vertical  to  angle  (])  of  45  degrees.  Now  if  a  kick  is 
applied  to  the  pendulum,  moving  the  pendulum  bob  beyond  cf)  =  90  degrees,  the  gravitational 
torque  decreases  and  the  pendulum  bob  will  continue  over  the  top  and  come  back  to  the  original 
position  after  several  small  swings  near  the  angle  §  of  45  degrees.  During  the  whole  process,  the 
pendulum  experienced  a  2 n  angle  change;  the  angular  velocity  reaches  a  maximum  at  a  point  near 
4>  =  0  and  then  is  reduced  to  zero  with  a  few  oscillations  around  the  final  equilibrium  position.  For 
the  junction,  when  a  proper  current  pulse  is  applied,  the  junction  will  be  switched  to  its  voltage 
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state  (phase  (f>  above  n/2)  and  reset  to  its  original  phase  plus  a  2tc  increase.  A  voltage  pulse  is 
developed  across  the  junction  with  a  sharp  peak  and  some  ringing  when  it  resets. 

1.2.4  Single  Flux  Quantum 

Now  we  are  going  to  introduce  the  concept  of  the  magnetic  flux  quantization  in  the  supercon¬ 
ductor  loop.  It  is  another  unique  macroscopic  quantum  mechanical  property  of  a  superconductor. 
The  Cooper  pairs  in  the  superconductor  can  be  described  by  a  boson  wave  function 

\| f(r)  =  \\\f(r)\e^^  (1.12) 

where  the  phase  has  to  obey  the  equation 


with 


ftV  9  =  e*AJs+e*A 


(1.13) 


A  =  m*/n*e*2  (1.14) 

In  a  superconductive  ring  shown  in  Fig.  1.7,  if  we  integrate  Eq.  (1.13)  along  a  closed  path  C 
marked  as  the  dashed  line  lying  inside  the  superconductor  surrounding  the  non-superconductive 
hole,  we’ll  have: 


Figure  1 .7  Contour  of  integration  within  a  superconductive  ring. 
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f?j)V9  •  ell  =  e*jj(A Js  +  A)-cll  (1.15) 

The  phase  9  of  the  wave  function  is  unique  or  differs  by  a  multiple  of  2k  at  each  point.  So  the  left- 
hand  side  of  Eq.  (1.15)  becomes  Ti  ■  2nn  =  nh  ,  where  n  is  an  integer.  The  integral  on  the  right- 
hand  side  is  London’s  fluxoid.  If  the  path  is  deep  inside  the  superconductor  (away  from  the  surface 
more  than  a  few  penetration  depths),  /  =  9 ,  so  the  right  hand  side  of  Eq.(1.15)  becomes, 

e*j>A-dl  =  <?*j"(VxA)  •  dS  =  e*j ’ B  ■  dS  =  (1.16) 

c  s  s 

where  Stokes'  theorem  is  used  for  the  first  equality  and  is  the  magnetic  flux  enclosed  by  the 
contour  C.  So 

cp^  =  nh/ e*  ,  where  n  =  9,±1  ,±2  ,±3 , ...  (1.17) 

The  magnetic  flux  here  is  quantized  in  the  unit  of  h/\e*\  ,  which  is  called  a  magnetic  flux  quan¬ 
tum  expressed  by  a  constant 


<PQ  =  h/2e  =  2.9679x  19  15Wb 


(U8) 


This  result  is  well  established  experimentally. 


A  properly  shunted  junction  can  generate  a  single  flux  quantum  pulse  when  it  switches.  As  we 
discussed  in  Sec.  1.2.3,  if  a  tunnel  junction  is  biased  near  its  critical  current  value,  the  junction  will 
switch  with  a  proper  input  pulse,  and  the  phase  of  the  junction  changes  by  2k\  a  voltage  pulse  is 
generated  across  the  junction  during  the  switching.  The  integral  of  the  voltage  pulse  over  time 
jv(t)dt  js  equal  to  a  flux  quantum  <P0-  Such  a  pulse  is  called  a  single-flux-quantum  (SFQ) 


pulse. 
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1.3  Basic  RSFQ  Gates  and  Logic  Presentation 

The  RSFQ  circuits  are  composed  of  junctions,  inductors  and  bias  resistors.  Also,  each  junction 
is  shunted  with  an  external  resistor.  The  value  of  (3C  is  usually  chosen  equal  to  be  about  1.0  so  that 
the  shunted  junction  has  a  non-hysteretic  static  I-V  characteristic.  The  researchers  at  Northrop 
Gramman  chose  to  use  (3C  ~  2,  which  gives  a  higher  ICR  product.  RSFQ  pulses  can  be  generated, 
transferred  and  stored  in  the  circuits  based  on  how  the  junctions  are  biased  and  the  inductor  values 
are  chosen. 

All  the  basic  RSFQ  circuit  components  can  be  divided  into  two  categories,  asynchronous  com¬ 
ponents  and  synchronous  components. 

Asynchronous  components  are  not  clocked  and  include  simple  elements  such  as  active  Joseph- 
son  transmission  lines  (JTLs),  splitters,  buffers,  and  confluence  buffers.  They  are  used  as  the  con¬ 
nections,  the  forks  and  the  mergers  in  the  logic.  The  more  complicated  toggle  flip-flop  (T flip-flop) 
with  an  internal  memory  is  also  an  asynchronous  circuit.  The  asynchronous  circuits  are  transparent 
to  the  input  signals;  the  signals  ripple  through  them.  The  outputs  are  generated  shortly  after  the 
inputs  arrive.  They  are  used  for  connections  and  in  sequential  logic. 

Synchronous  components  are  clocked.  All  the  synchronous  components  contain  internal  mem¬ 
ory.  The  incoming  data  set  the  logic  states  of  the  internal  memories.  The  information  is  stored 
there  until  the  arrival  of  a  clock  pulse  releases  it  to  the  output.  The  basic  synchronous  components 
are  the  latches.  Two  widely  used  latches  are  discussed  below,  RS  flip-flop  and  D2  flip-flop.  There 
are  other  latches  not  discussed  here.  Most  synchronous  RSFQ  gates  are  formed  as  combinational 


logic  followed  by  a  latch. 
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An  RSFQ  circuit  represents  the  bit  information  in  its  own  unique  way.  The  convention  for  the 
RSFQ  logic  presentation  will  be  discussed  in  this  chapter. 


1.3.1  Asynchronous  RSFQ  Circuit  Components 

The  simplest  component  is  the  Josephson  transmission  line  (JTL),  which  is  used  as  an  inter¬ 
connection  in  RSFQ  circuits.  Figure  1.8  shows  a  few  stages  of  JTLs.  The  circuit  parameters  are 


'w 


*  4 1 
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Figure  1.8  A  few  stages  of  the  Josephson  Transmission  Lines  (JTLs).  Ibs  are  the  dc  biases 
to  the  junctions,  Lss  are  the  JTL  inductances  connecting  to  the  next  stage. 


chosen  so  that  ICLS  =  <t>0/2,  where  Ic  is  the  critical  current  of  the  junction.  The  dc  current  supply  is 
set  to  about  0.7  Ic,  which  is  equivalent  to  a  7t/4  phase  drop  across  the  junction.  When  an  SFQ  volt¬ 
age  pulse  comes  across  the  junctions,  it  will  be  switched  and  the  SFQ  pulse  will  be  reproduced  and 
propagate  along  the  JTLs.  Both  the  inductance  Ls  and  the  dc  bias  level  can  be  adjusted  to  achieve 
different  propagation  delays.  Besides  interconnection,  JTLs  can  reshape  the  SFQ  pulses  and  even 
amplify  the  voltage  of  the  SFQ  pulses  if  progressively  larger  Ic  values  or  higher  dc  bias  levels  are 
chosen  in  the  JTLs.  For  a  compact  layout,  usually  two  stages  of  JTLs  share  a  common  dc  bias  cur¬ 
rent  supply  as  shown  in  Fig.  1.9.  The  dc  bias  is  inserted  in  the  middle  of  the  connection  inductor 
between  the  two  stages.  This  arrangement  doesn’t  affect  the  circuit  dc  bias  margins  or  the  circuit 
dynamics.  JTLs  are  bidirectional.  Pulses  can  propagate  from  either  end  to  the  other  end. 
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Figure  1.9  A  compact  two-stage  JTL  by  sharing  one  dc  bias  input  line  between  two  neigh¬ 
boring  stages  of  JTLs. 


Shown  in  Fig.  1.10a  is  an  SFQ  pulse  splitter.  It  provides  the  function  of  a  fork.  The  junctions 
Jj,  J2  and  Jj  are  biased  close  to  their  critical  currents.  An  SFQ  pulse  from  the  input  A  will  switch 
J j  and  the  produced  pulse  current  is  divided  between  L2  and  L3  to  switch  J2  and  J3.  A  pulse  will  be 
produced  at  each  of  the  outputs  B  and  C.  Like  the  JTL,  the  pulse  splitter  doesn’t  protect  its  input 
from  signals  at  its  outputs.  But  the  two  circuit  components  discussed  below  only  allow  one  direc¬ 
tional  transfer  of  SFQ  pulses  from  input  to  output. 


A  simple  buffer  stage  is  shown  in  Fig.  1.10b.  Icl  is  larger  than  4  2-  So  J]  is  biased  closer  to  its 
critical  current  than  J2  by  Ib.  When  an  SFQ  pulse  arrives  at  the  input  A,  the  incoming  pulse  current 
adds  to  the  bias  current  to  switch  Jj.  But  for  J2,  the  direction  of  the  incoming  pulse  current  is  oppo¬ 
site  to  that  of  the  bias  current,  the  two  currents  tend  to  cancel  each  other  and  J2  stays  in  the  zero 
voltage  state.  So  the  SFQ  voltage  pulse  produced  at  the  top  of  J2  will  appear  on  the  top  of  J2  and 
propagate  to  the  output  B.  On  the  other  hand,  if  an  SFQ  pulse  arrives  at  the  output  B,  the  incoming 
current  will  add  to  the  bias  current  of  both  Jj  and  J2.  But  since  J2  has  smaller  Ic,  it  will  be  switched 
first  and  set  to  the  high  impedance  state.  So  the  bias  current  for  J2  will  be  temporarily  shut  off,  and 
J j  will  stay  unswitched  during  the  period  of  the  incoming  pulse.  So  pulses  from  the  output  B  will 
be  absorbed  by  J2,  not  being  able  to  reach  the  input  A. 
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Shown  in  Fig.  1.10c  is  a  confluence  buffer  which  merges  the  pulses  from  the  two  inputs  A  and 
B  into  one  single  output  C.  As  we  can  see,  each  incoming  branch  is  like  a  buffer  stage.  If  a  pulse 
comes  from  input  A ,  J  3  is  switched,  while  J3  stays  unswitched.  An  SFQ  pulse  produced  at  the  top 
of  J j  then  propagates  through  J 3,  L3  to  switch  J 5.  So  the  pulse  is  reproduced  at  the  output  C.  Mean¬ 
while,  the  input  B  is  protected  from  the  pulse  propagating  from  the  input  A  to  the  output  since  J4 
absorbs  the  current  caused  by  the  pulse.  Likewise,  an  SFQ  pulse  coming  from  input  B  will  be 
reproduced  and  propagate  to  the  output  C.  For  the  correct  function  of  this  confluence  buffer,  pulses 
coming  from  A  have  to  keep  a  certain  delay  from  the  pulses  coming  from  B.  If  a  pulse  from  A  is 
too  close  to  a  pulse  from  B,  only  one  pulse  with  larger  amplitude  will  be  generated  at  the  output  C 
instead  of  two  as  it  is  supposed  to  be. 

Now  we  are  going  to  introduce  a  more  complicated  asynchronous  component  in  RSFQ  cir¬ 
cuits,  the  T flip-flop.  It  contains  a  storage  loop  which  is  absent  in  the  previous  asynchronous  com- 


Figure  1.10  Some  asynchronous  RSFQ  circuit  components,  (a)  SFQ  pulse  splitter.  Ic2  = 
lC3=  lc,  lci  =  1.4 lc,  lbj  =  0.75 lcj,  L2=L3=  0.6 (b)  Simple  buffer  stage.  Ic1  = 
1  AIc2,  lb  =  0.7 lc2.  (c)  Confluence  buffer.  Ic3  =  lc4  =  lc5  =  lc ,  lc1  =  lc2  =  1 ,4/c,  lb1  = 
1  -4/c,  lb2  =  0.7 lc,  L3  =  0.5  c ty/c. 
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ponents  we’ve  discussed.  As  shown  in  Fig.  1.11,  a  T flip-flop  has  one  input  and  two  outputs.  The 
input  pulses  going  to  the  T flip-flop  are  alternately  diverted  to  the  two  outputs.  So  a  T flip-flop  can 
function  as  a  2-bit  counter.  In  the  circuit  schematic  diagram,  Jj,  J3  and  L1-L2  form  a  storage  loop. 
The  storage  loop  has  two  states  according  to  the  direction  of  the  circulating  current  flowing  in  it.  If 
the  current  is  circulating  clockwise,  it  is  state  "1";  if  counter-clockwise,  it  is  state  "0".  The  storage 
loop  flips  its  state  for  each  input  pulse.  Quiescently,  Ib  is  unevenly  divided  between  J  3  and  J3.  We 
can  view  the  dc  bias  currents  hi  and  IJ3  in  Jj  and  J3  as  a  superposition  of  the  / //2  and  a  counter¬ 
clockwise  circulating  current  Icir  If  the  storage  loop  is  in  state  "0"  and  a  pulse  arrives  at  the  input 
A,  the  current  passing  through  J2  adding  to  hi  will  exceed  hi  and  switch  J /  into  its  voltage  state, 
an  RSFQ  pulse  is  produced  at  F0.  While  at  the  same  time,  the  current  passing  through  J4  will 
switch  J4,  J3  remains  in  the  zero-voltage  state  and  no  output  pulse  is  generated  at  Fj.  For  the  stor¬ 
age  loop,  after  Jj  is  switched  to  its  high  impedance  state,  the  bias  current  Ibl  is  redirected  to  L1-L2 
and  J3.  The  loop  contains  a  clockwise  circulating  current  now  and  is  switched  to  state  "1".  Now  J3 
is  biased  close  to  h-3  and  Jj  is  biased  to  a  low  phase.  Similarly,  now  if  an  input  pulse  arrives  at  the 
input,  the  input  current  will  switch  J2  and  J3,  an  output  pulse  will  be  produced  at  Fj,  and  the  stor¬ 
age  loop  resets  to  the  state  "0". 


Figure  1.11  AT  flip-flop.  Example  values:  /c1  =  279  pA,  /c2  =251  pA,  /c3  =356  pA,  /c4 

=224  pA,  U  =  264  pA,  Li  =  2.95  pH,  L2  =  2.38  pH,  L3  =  4.04  pH,  La  =  3.87  pH, 
4  -  1 .11  ph  R  =  1 .1 5  0,  4,1  -  297  (.A, \2  -  31  IpA. 
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1.3.2  Synchronous  RSFQ  Circuit  Components 

Figure  1.12  shows  a  key  component,  the  simplest  latch  in  RSFQ  circuits,  RS  flip-flop.  The  core 
of  the  circuit  is  a  two-junction  interferometer  J^-L-J4,  with  ICL  =  1 .25<F0,  so  that  it  can  store  a  flux 
quantum.  The  interferometer  has  two  states,  “0”  and  “1”,  corresponding  to  a  circulating  current  Ip 
=  $>q/2L  flowing  counter-clockwise  or  clockwise  in  the  loop.  The  current  in  the  loop  can  be 
expressed  as  the  sum  of  one  half  of  the  dc  bias  current  and  the  circulating  current,  IJ3  =  (V2  )+/„, 
!j4  =  (Ib/ 2)  -  Ip.  Initially,  the  circuit  is  biased  to  state  “0”,  with  the  sample  circuit  parameter  values, 
Ij3  =  0.8 Ic,  Ij4  =  0,  and  Ijj  =  0,  IJ2  =  0.  Pulses  applied  to  the  S  and  R  inputs  will  set  the  circuit  to  the 
state  “1”  and  reset  the  circuit  to  the  state  “0”.  When  a  pulse  arrives  on  the  S  (set)  input,  the  current 
will  transfer  through  J2,  adding  to  the  initial  bias  current  on  hs  and  switching  J3  to  its  high  imped¬ 
ance  voltage  state.  So  the  dc  bias  current  is  redirected  to  L-J4,  ,J4  -  (4/2)  -  Ip  =  0.84-  4  resets  to 
the  superconductive  state,  IJ3  =  0.  The  circulating  current  is  clockwise,  and  the  circuit  is  set  to  state 
“1”.  When  a  pulse  arrives  at  the  R  (reset)  input  at  the  circuit  state  “1”,  it  will  pass  through  Lh  Jj 
and  switch  J4  to  it  is  high  impedance  state,  so  Ib  returns  to  J3,  resetting  the  circuit  to  the  “0”  state. 
At  the  same  time  an  RSFQ  pulse  is  released  to  the  output  F. 

J j  and  J2  have  lower  critical  current  value  than  J3,  J4  and  this  prevents  the  circuit  from  errone¬ 
ous  function  in  the  cases  of  unwanted  pulses.  When  the  circuit  is  in  state  “1”,  if  there  is  a  pulse 
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Figure  1.12  A  RS  flip-flop.  Example  values:  /c1  =  /c2  =  /c,  /c 3  =  lc4  =  1 .41  /c,  /b  =  0.8/c,  L 
=  1-25  O0//c. 
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Figure  1.13  A  D  flip-flop  (a)  circuit  diagram  and  (b)  the  Moore  diagram  for  its  operation. 


coming  from  the  S  input,  J2  will  be  switched  instead  of  J3,  the  incoming  pulse  voltage  is  absorbed 
by  J2  and  the  storage  loop  state  remain  unattacked.  And  if  there  is  a  pulse  coming  from  R  input 
when  the  circuit  is  in  state  “0”,  J3  is  switched  instead  of  J4,  no  output  pulse  is  produced  at  F.  And 
the  storage  loop  stays  at  the  original  state.  When  the  clock  is  fed  to  R,  and  data  fed  to  S,  the  RS  flip- 
flop  functions  as  a  single  rail  latch. 

In  RSFQ  circuits,  sometimes  there  is  advantage  to  use  dual-rail  signals.  The  D  flip-flop  is  a 
latch  which  can  accept  a  single-rail  input  and  reproduce  dual-rail  outputs.  As  we  can  see  in 
Fig. 1.13a,  the  D  flip-flop  is  much  more  complicated  than  the  RS  flip-flop  since  it  has  to  recover  the 
output  from  input  signal.  The  main  storage  loop  is  J7-L4-Ls-J5.  It  has  two  states.  Initially,  the  cur¬ 
rent  circulates  counter-clockwise,  J7  is  biased  close  to  its  critical  state,  while  J5  has  phase  close  to 
zero.  A  pulse  arriving  at  the  input  Data  will  switch  J7,  set  the  loop  to  state’T”,  switching  the  circu¬ 
lating  current  in  the  loop  to  clockwise,  making  J3  biased  close  to  its  critical  state.  Now  a  pulse 
arriving  at  the  input  Clock  will  switch  J5,  J3  sequentially,  generating  an  output  pulse  at  Out.  The 
circuit  state  is  reset  to  “0”.  If  a  clock  pulse  arrives  during  the  state  “0”,  J4,  J2  and  Jj  will  be 
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switched  sequentially  and  an  output  pulse  is  generated  at  Out  instead  of  Out.  The  operation 
described  above  can  be  understood  more  clearly  in  a  Moore  diagram,  as  shown  in  Fig.  1.13b. 

1.3.3  Interconnect 

JTLs  are  broadly  used  for  on-chip  interconnect  for  blocks  with  short  separation.  It  has  advan¬ 
tage  to  regenerate  and  reshape  the  SFQ  pulse.  But  for  chip-to-chip,  on-chip  long-distance  intercon¬ 
nection,  and  in  recent  years  even  on-chip  short  distance  interconnection,  passive  transmission  lines 
(PTL)  (a  microstrip  line  or  a  stripline)  are  used.  A  JTL  has  a  few-picosecond  delay  for  each  stage. 
For  long  interconnections,  the  delay  is  large  and  hard  to  control  because  of  process  variation  and 
thermal  jitter.  And  routing  is  difficult.  However,  the  signal  transmission  in  the  PTL  is  ballistic, 
with  very  short  delay  (a  few  ps/mm).  Routing  is  much  easier.  Special  driver  and  receiver  circuits 
[5][15][16]  are  needed  at  the  two  ends  of  a  PTL  to  launch  and  accept  the  SFQ  pulses.  Connected  to 
the  transceiver  circuits  are  usually  JTL  stages  to  shape  the  SFQ  pulses.  Efforts  are  made  to  inte¬ 
grate  the  transceiver  circuits  into  the  basic  RSFQ  gate  library  to  facilitate  broad  PLT  interconnec¬ 
tion  [5].  Another  application  note  on  using  PTL  interconnection  is  proper  shielding  to  avoid 
crosstalk.  The  SFQ  pulse  energy  is  very  small,  less  than  10  crossovers  can  make  the  SFQ  pulse 
totally  disappear  due  to  the  capacitive  coupling  [5]. 

1.3.4  The  Interface  Circuits 

In  RSFQ  circuits,  data  are  carried  by  the  SFQ  pulses.  But  in  many  other  types  of  circuits,  volt¬ 
age  levels  "high"  and  "low"  are  used  to  represent  "1"  and  "0".  So  when  RSFQ  circuits  are  used 
with  such  other  circuits,  interface  circuits  are  needed  to  convert  the  signals  between  the  two  forms. 
There  are  many  ways  to  construct  a  DC/SFQ  converter  and  an  SFQ/DC  converter.  In  this  section, 
we  are  going  to  introduce  two  examples. 
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Figure  1.14  A  DC/SFQ  (a)  circuit  diagram  (b)  waveforms  (c)  illustrations  of  return-to-zero 
(RZ)  and  non-return-to-zero  (NRZ)  data. 


A  DC/SFQ  converter  transforms  the  voltage  waveforms  into  a  series  of  SFQ  output  pulses. 
Fig.  1.14a  shows  the  circuit  diagram  for  a  DC/SFQ  converter.  And  Fig.  1.14b  shows  the  input  and 
output  waveforms  for  the  DC/SFQ  converter.  For  this  circuit,  the  dc  input  has  a  return-to-zero 
(RZ)  waveform,  which  means  that  for  each  "1",  the  waveform  goes  to  high  first  but  must  fall  back 
to  low  level  again  before  the  next  digit.  A  comparison  of  the  waveforms  for  the  RZ  data  and  the 
non-return-to-zero  data  (NRZ)  is  shown  in  Fig.  1.14c.  For  each  rise  in  the  input  wave  form,  which 
is  a  “1",  an  SFQ  pulse  is  generated  at  the  output.  Let’s  take  a  close  look  at  how  the  circuit  actually 
realizes  this  conversion.  When  its  input  is  raised  above  a  certain  level  Iup,  the  critical  state  of  J3  is 
reached,  and  an  SFQ  pulse  is  generated  across  it.  At  the  same  time,  the  internal  interferometer  is 
switched  to  another  flux  state.  In  order  to  reset  it  to  the  initial  state,  the  input  current  has  to  be 
reduced  below  a  certain  level  l^own-  Both  J1  and  J2  will  be  triggered  through  a  2k  phase  leap  and 
Jj  is  biased  to  its  initial  state.  This  happens  during  the  input  return-to-zero  path.  And  actually  I(j,wn 
is  less  than  zero.  This  design  was  originally  done  by  Polonsky  et  al.  [17].  Simulation  and  experi- 
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Figure  1.15  An  SFQ/DC  converter  (a)  the  circuit  diagram  and  (b)  the  waveforms  of  its 
input  and  outputs. 

ments  shows  that  this  converter  has  larger  margins  (up  to  +/-  60%  in  simulation)  than  other  varia¬ 
tions. 

An  SFQ/DC  converter  will  do  the  reverse  of  a  DC/SFQ  converter.  SFQ  input  pulses  will  be 
converted  to  a  voltage  waveform  at  the  output.  Fig.  1.15  shows  a  T flip-flop-based  SFQ/DC  con¬ 
verter  and  its  input  and  output  waveforms.  The  output  waveform  needs  some  explanation  since  it 
is  neither  a  standard  RZ  nor  a  standard  NRZ  waveform.  Each  transition  in  the  output  waveform 
represents  a  "1",  corresponding  to  an  input  SFQ  pulse.  As  we  can  see  from  the  circuit  diagram,  this 
converter  is  based  on  a  T flip-flop.  Junctions  J5  and  J6  are  inserted  in  the  middle  of  the  T  flip-flop 
storage  loop  to  read  the  T  flip-flop  state.  If  the  basic  interferometer  is  in  state  "0",  there  will  be  a 
small  current  flowing  through  J6  and  J5,  so  the  voltage  reading  across  J5  is  zero.  When  the  storage 
loop  switches  to  state  "1",  there  is  larger  current  from  hi  flowing  through  the  J6,  J5  branch,  adding 
to  the  bias  current  from  Ib.  This  leads  J5  to  its  voltage  state,  and  an  average  voltage  is  developed 
across  it.  So  for  an  input  SFQ  pulse,  the  T flip-flop  will  reverse  its  storage  state,  the  voltage  across 
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J5  will  switch  between  “zero”  and  “high”,  producing  a  transition  in  the  output  waveform.  The  typ- 
ical  amplitude  of  the  output  waveform  is  about  100  u  V  for  the  1  kA/cm  Nb  process,  which  usu¬ 
ally  takes  some  pre-amplification  either  on-chip  or  off-chip  when  it  is  fed  to  the  oscilloscope.  Such 
SFQ/DC  converters  have  been  tested  experimentally  with  large  margins  (+/-30%),  which  agrees 
with  the  simulation  results  -  see  e.g.,  Kaplunenko  et  al.  and  Polonsky  et  al.  [17][18]. 


1.3.5  The  RSFQ  Information  Presentation  and  Logic  Gates 


An  RSFQ  gate  such  as  an  AND  gate,  OR  gate,  inverter  etc.  can  be  constructed  from  a  combi¬ 
nation  of  asynchronous  circuits  and  a  latch  at  the  end.  Since  data  are  represented  by  picosecond 
pulses  instead  of  voltage  levels,  RSFQ  logic  uses  its  own  convention  for  clocking  and  the  decision 
of  logic  gates.  Shown  in  Fig.  1.16a.  is  a  block  diagram  of  a  general  RSFQ  clocked  gate.  Sj,  S2,  ..., 
Sn  are  the  inputs  to  the  gate,  T  is  the  clock,  and  Sout  are  the  outputs.  Fig.  1.16b  shows  the  timing 
diagram  of  the  signals  for  an  OR  gate  with  two  inputs  Sj,  S2,  and  one  output  Sout.  The  time  interval 
between  the  two  clock  pulses  is  one  clock  period  x.  If  a  pulse  arrives  on  the  input  Sn  at  any  time 
during  the  clock  period,  it  is  considered  a  “1”.  The  absence  of  an  input  pulse  at  Sn  in  the  clock 


Figure  1.16  A  general  RSFQ  gate  (a)  the  block  diagram  and  (b)  the  timing  diagram  of 
the  input  pulses  on  S7  and  S2  arriving  between  two  clock  pulses  and  the  out¬ 
put  pulse  at  Sout  produced  at  the  end  of  the  clock  period  for  an  OR  gate. 
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period  represents  a  “0”.  The  order  of  the  arrival  of  the  inputs  doesn’t  matter.  Usually  the  gate  has 
several  internal  logic  states.  The  inputs  together  will  set  the  gate  to  a  certain  logic  state  during  the 
period.  The  gate  will  hold  the  evaluation  until  the  arrival  of  the  clock  pulse  ending  the  period.  A 
pulse  or  no  pulse  will  appear  at  the  output  Sout  accordingly.  And  the  internal  state  of  the  gate  will 
reset  to  its  original  state.  For  the  OR  gate,  a  pulse  arrives  at  Sj  and  no  pulse  at  S2  between  the  two 
clock  pulses,  i.e.,  “1”  for  Sj  and  “0”  for  S2-  So  after  the  arrival  of  the  second  clock  pulse  at  the 
beginning  of  the  next  clock  period,  a  pulse  is  produced  at  Sout,  representing  “1”,  which  is  the  cor¬ 
rect  function  of  an  OR  gate.  For  the  proper  function  of  the  gate,  inputs  pulses  should  arrives  after 
the  first  clock  pulse  with  a  delay  thoid  for  the  gate  to  reset  its  logic  state  and  before  the  second 
clock  pulse  by  a  time  tsetup  for  the  gate  to  fully  set  up  its  internal  logic  state  corresponding  to  the 
inputs. 

The  delay  (D)  gate  implemented  by  the  RS  flip-flop  shown  in  Fig.  1.12  is  the  simplest  clocked 
gate  in  RSFQ  circuits.  If  we  feed  data  to  the  S  terminal,  and  clock  to  the  R  terminal,  the  RS  flip-flop 
behaves  like  a  latch.  Any  data  arriving  at  the  input  in  one  clock  period  will  set  the  internal  logic 
state  of  the  RS  flip-flop  and  be  released  to  the  output  at  the  beginning  of  the  next  clock  period. 
JTLs  can  be  combined  with  the  RS  flip-flop  to  change  the  delay  of  the  gate.  The  D2  flip-flop  is 
another  D  gate  with  the  dual-rail  outputs. 
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CHAPTER  2 

Technology  Scaling  and  UCB 
High-Jc  Niobium  Process 


2.1  Technology  Scaling 

The  speed  of  RSFQ  circuits  scales  up  with  the  increase  of  ICR  product  of  the  Josephson  junc¬ 
tion.  Ic  is  the  critical  current  for  the  Josephson  junction.  R  is  the  shunt  resistance  on  the  Josephson 
junction.  For  low  Tc  Nb-A10x-Nb  tunnel  junctions,  an  external  shunt  resistance  is  connected  paral- 
lei  with  the  junction  to  make  Pc  equal  to  1 .  When  Pc  =  1 ,  ICR  is  proportional  to  (Jc)  independent 
of  Ic  of  the  junction.  So  the  higher  Jc,  the  higher  ICR  of  the  junctions,  the  faster  RSFQ  circuits  we 
can  achieve.  At  the  same  time,  if  we  keep  the  same  Ic  for  the  circuits,  junction  size  will  be  smaller. 
Assuming  we  can  scale  down  the  size  of  the  inductors  and  the  shunt  resistors,  the  density  of  the 
circuits  on  a  chip  will  be  increased.  The  power  consumption  for  each  circuit  is  determined  by  Ic 
and  dc  supply  voltage  instead  of  Jc.  So  the  circuit  power  dissipation  stays  the  same  with  the  scaling 
of  Jc,  but  the  power  density  will  scale  with  the  circuit  density  on  the  chip.  For  this  thesis  project, 
we  had  designs  for  both  1  kA/crm  and  6.5  kA/cm  Nb  processes.  We  focused  on  the  junction  scal¬ 
ing  to  achieve  higher  circuit  speed,  while  leaving  the  size  of  inductors  and  resistors  unchanged. 
Shrinking  the  size  of  inductors  and  resistors  is  difficult  due  to  process  variation  control.  Layouts  of 
some  1  kA/cm  designs  can  be  modified  simply  with  the  sizes  of  the  junctions  changed  for  the  6.5 
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0  . 

kA/cm  implementation  if  some  margin  loss  is  allowed.  Many  groups  are  striving  to  make  high  Jc 
junctions  with  small  spreads  [19][20][21][22][23]. 

Besides  the  low-Tc  Nb  process,  SNS  junctions  and  high-Tc  YBCO  junctions  are  two  alterna¬ 
tive  technologies  where  RSFQ  circuits  can  be  implemented.  Both  of  them  have  intrinsic  non-hys- 
teretic  I-V  curves.  The  state  of  the  art  of  ICR  in  these  technologies  is  comparable  with  the  one  used 
in  Nb  process.  And  Pc  could  be  much  less  than  1  depending  on  the  process.  The  penta-layer 
Nb/NbTiN/TaN/NbTiN/Nb  SNS  junction  has  a  similar  sandwich  structure  [24]  [25].  The  barrier 
layer  TaN  is  a  conductor,  which  offers  a  constant  internal  shunt  resistance  for  a  junction  by  itself. 
The  advantage  of  absence  of  external  shunt  resistance  is  saving  area  and  reducing  parasitic  induc¬ 
tances.  YBCO  junctions  can  operate  at  a  higher  temperature  than  Nb  junctions,  which  is  valuable 
for  some  applications.  Since  YBCO  junctions  are  formed  with  different  geometric  structures,  even 
with  the  absence  of  the  external  shunt  resistance,  the  parasitic  inductance  values  are  large  enough 
to  affect  the  circuit  performance.  Thermal  noise  and  the  process  variation  are  the  other  two  factors 
to  limit  the  complexity  of  the  circuit  built  with  YBCO  junctions. 

2.1.1  RSFQ  Circuit  Speed  vs.  ICR  Product 

We  can  relate  the  junction  switching  speed  with  !CR  qualitatively  through  the  following  analy¬ 
sis.  Let’s  recall  the  junction  CRSJ  equivalent  circuit  model  shown  in  Fig.  1.2.  The  leftmost  branch 
is  the  junction  supercurrent  1= Ic  sincj),  which  can  be  viewed  as  a  nonlinear  inductance.  The  voltage 

V  across  the  junction  can  be  related  to  the  total  equivalent  inductance  LJt  by  the  equation, 

V  =  d\LJt{I)  I\/ di,  where  /  is  the  instantaneous  pair  current.  Using  Eq.  (1.1)  and  (1.2),  V  can  be 
expressed  as 
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Figure  2.1  The  RCL  equivalent  circuit  for  the  shunted  junction  in  RSFQ  circuits  when  the 
junction  supercurrent  is  viewed  as  a  nonlinear  inductance.  Here  the  constant 
inductance  Lj  is  used  as  an  approximation. 


so  that 


L 


Jt 


sin 

J  I/Ic 


(2.2) 


where 


Lj=  O0/(2nlc)  (2.3) 

L Jt  varies  from  Lj  to  (n/2)L j  when  /  changes  from  0  to  Ic.  So  we  can  use  Lj  as  a  measure  of  the 
junction  equivalent  inductance.  For/c  =  125  pA,  Lj  =  2.64  pH.  Now  the  junction  equivalent  circuit 
can  be  viewed  as  an  RCL  parallel  combination  as  shown  in  Fig.  2.1.  There  are  two  time  constants 
for  this  combination.  L JR  =  (\\J(2tiI  (R),  and  RC.  The  junction  switching  speed  is  determined  by 
the  larger  one  of  these  two  time  constants.  When  these  two  time  constants  are  equal,  [)c  = 
RC/(L JR)  =  2nlcR  C/(l>()  =  1,  the  junction  is  critically  damped  in  the  case  without  any  loading  and 
has  optimal  switching  speed  for  fixed  Ic  and  C.  With  [)c  around  1,  when  (f,  <  1  ,  the  pulse  main  lobe 
would  be  wider  than  that  in  the  case  |3C  =  1 ;  but  when  [:S„>  I  ,  the  envelope  of  the  ringing  tail  in 
the  SFQ  pulse  will  decay  slower.  So  (3C  =  1  is  the  optimal  case.  Of  course  the  actual  switching 
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dynamics  are  much  more  complicated  since  it  is  a  nonlinear  process.  And  in  the  circuits,  each 
junction  has  different  loading,  which  requires  an  individual  optimal  shunt  condition  slightly  differ¬ 
ent  from  P(.  =  1 .  Normally  in  low-Tc  Nb  RSFQ  circuits,  people  chose  the  same  |3C  around  1  for  all 
junctions  since  it  is  difficult  to  define  the  loading  and  find  the  individual  optimal  |3C  for  each  junc¬ 
tion.  (3C<  1  is  required  for  the  junction  to  have  a  non-hysteretic  I—V characteristic  to  guarantee  the 
reset  of  the  junction  after  the  generation  of  an  SFQ  pulse.  In  this  case,  the  junction  switching  speed 
is  determined  by  the  time  constant  L JR.  We  define  a  time  unit  x0  =  L JR  =  <T>J(2nIcR).  x0  is 
inversely  proportional  to  I CR.  So  the  higher  f(R1  the  smaller  x0  is,  the  faster  the  junction  switches 
and  the  narrower  the  SFQ  pulse  full-width-half-maximum  (FWHM).  In  typical  RSFQ  circuits,  the 
SFQ  pulse  FWHM  is  about  4x0.  And  the  maximum  speed  of  the  circuits  ranges  from  l/(40x0)  to 
1/(25  xo)  since  enough  time  has  to  be  left  between  the  consecutive  data  pulses  or  between  the  data 
pulse  and  the  clock  pulse  in  a  clocked  gate  to  avoid  pulse  interferences. 

Simulations  in  this  section  will  show  how  the  SFQ  pulse  FWHM  and  speed  of  the  circuits 
scale  with  ICR  of  the  junctions  as  predicted  above.  Effects  of  other  parameters  such  as  dc  bias  level, 
junction  shunt  condition  |3C,  and  inductance  values  in  the  circuits  are  also  investigated.  We  will  fur¬ 
ther  find  out  that  not  only  the  pulse  width  but  also  the  interactions  between  the  pulses  determine 
the  speed  of  the  circuits. 

First  we  will  examine  the  SFQ  pulse  FWHM  and  the  one-stage  JTL  delay  in  a  50-stage 
Josephson  ring  oscillator  as  shown  in  Fig.  2.2.  Each  stage  is  one- JTL.  All  the  50  stages  are  identi¬ 
cal  in  terms  of  the  junction  Ic,  junction  shunt  resistance  R  and  capacitance  C,  dc  bias  level  //,  and 
the  circuit  inductance  Ls  connecting  to  the  next  stage.  In  the  simulation,  we  feed  one  artificial  SFQ 
pulse  to  the  ring  oscillator.  This  single  pulse  will  be  reshaped,  propagates  and  circulates  in  the  ring 
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Figure  2.2  50-stage  Josephson  ring  oscillator.  All  the  fifty  stages  are  identical  JTL  stages, 
including  lc  of  the  junctions,  Ls,  and  the  dc  bias  level  / b. 

oscillator.  The  ring  is  closed  by  inserting  a  voltage-controlled-voltage-source  between  stage  50 

and  stage  1.  So  the  SFQ  pulse  circulates  in  the  ring  in  one  direction. 

Fig.  2.3  shows  the  simulation  results  for  fixed  dc  bias  level  IbIIc  =  0.7  and  P // ( 2 ti )  =  IcLJ<T>q 
=0.5,  which  are  typical  design  values  for  a  JTL,  while  varying  IJi  and  Pc.  Shown  in  Fig.  2.3a  is  the 
relation  of  the  SFQ  pulse  FWF1M  and  vs.  the  junction  I \R  for  Pc  ranging  from  0  to  2.  We  can  see 
the  RSFQ  pulse  FWF1M  is  inversely  proportional  to  the  value  of  lrR  as  the  x0.  Flowever,  Pc  affects 
the  pulse  width  in  a  weak  manner.  When  Pc  varies  from  0  to  2,  the  pulse  width  only  increases 
about  1.4  times.  Don’t  get  confused  here  with  the  statement  that  the  P(.  =  1  is  the  optimal  shunt 
condition.  There  Ic  (so  as  L J)  and  C  are  fixed,  we  are  trying  to  find  the  optimal  R  to  make  the  larger 
one  of  the  two  time  constants  L  JR  and  RC  to  have  a  minimum  value.  Flere  Ic  and  R  are  fixed,  so 
one  time  constant  L JR  is  fixed.  Now  by  increasing  C  (so  as  Pc),  the  other  time  constant  RC  is 
increased,  which  puts  some  weak  slowing  effect  on  the  junction  since  L JR  is  the  dominant  time 
constant  when  Pc  <  1 ,  and  when  Pc  >  1 ,  the  main  effect  of  the  increasing  C  (so  Pc)  is  slower  decay 
of  the  ringing  in  the  SFQ  pulse.  So  the  junction  FWF1M  is  increased  weakly  with  increasing  Pc. 
Shown  in  Fig.  2.3b,  the  RSFQ  pulse  peak  voltage  is  proportional  to  the  IJl,  which  is  expected 
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Figure  2.3  Simulation  of  the  50-stage  Josephson  ring  oscillator  in  Fig.  2.2.  Ic  =  0.2  mA,  4//c  = 
0.7,  Ls  =  5.2  pH,  pL/(2jt)  =  0.5.  (a)  The  RSFQ  pulse  FWHM,  x0  vs.  ICR.  (b)  The 
RSFQ  pulse  peak  voltage  vs.  ICR.  (c)  The  delay  of  one  stage  JTL,  x0  vs.  ICR.  (d) 
Normalized  FWHM  and  one-stage  JTL  delay  for  pc  =  1 . 
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since  the  area  under  the  pulse  is  a  constant,  one  flux  quantum.  With  Pc  increasing  from  0  to  2,  the 
pulse  peak  voltage  decreases  weakly.  The  delay  of  a  one-stage  JTL  td  in  the  ring  oscillator  and  x0 
vs.  ICR  are  plotted  in  Fig.  2.3c.  The  delay  is  inversely  proportional  to  ICR.  And  pc  affects  the  delay 
weakly.  If  we  normalize  the  pulse  width  and  the  one-stage  JTL  delay  by  x0  as  plotted  in  Fig.  2.3d, 
they  are  almost  constant  for  the  entire  ICR  range.  At  the  typical  JTL  design  values,  70%  dc  bias 
level,  PjL/(2tc)  =  1,  and  P(.  =  1 ,  the  SFQ  pulse  FWHM  and  one-stage  JTL  delay  td  in  the  ring  oscil¬ 
lator  are  slightly  larger  than  4 Xq. 

Fig.  2.4  shows  the  effect  of  the  dc  bias  level  Ib/Ic  on  the  SFQ  pulse  FWHM  and  the  one  stage 
JTL  delay  td.  Here  we  have  a  fixed  IJi  =  0.6  mV,  Pc  =  1,  and  $lI(2k)  =  0.5,  so  Xq  =  0.55  ps.  From 
Fig.  2.4a,  we  can  see  both  the  pulse  FWHM  and  the  delay  td  decrease  with  the  increasing  dc  bias 
level  IfJIc.  When  lbUc  <75%,  the  delay  td  is  larger  than  the  pulse  FWHM.  With  IbIIc  >75%,  td  is 
smaller  than  the  pulse  FWHM.  While  Ib/Ic  varies  from  0.5  to  0.9,  the  FWHM  changes  from  4.8tq 
to  3.3x0  and  td  changes  from  6.3x0  to  3x0  as  plotted  in  Fig.  2.4b.  By  increasing  the  dc  bias  level,  the 
circuit  is  faster,  but  there  is  loss  of  the  upper  dc  bias  margin  by  doing  so.  So  usually  we  design  and 
optimize  the  circuit  starting  with  a  70%  dc  bias  level  to  have  enough  dc  bias  margin  at  the  design 
frequency.  But  we  can  expect  to  push  the  circuit  to  run  at  higher  speed  by  increasing  the  dc  bias 
level  with  reduced  dc  bias  margin  if  needed. 

The  JTL  inductance  Ls  affects  the  SFQ  pulse  FWHM  and  the  one  stage  JTL  delay  td  differ¬ 
ently  as  shown  in  Fig.  2.5.  In  this  simulation,  we  have  fixed  ICR  =  0.6  mV,  so  x0  =  0.55  ps;  IbIIc  =  0.7, 
Pc  =  1  and  vary  Ls.  The  FWHM  changes  very  little  when  Ls  varies,  but  td  increases  almost  linearly 
with  the  increasing  Ls.  When  Ls  varies  from  1.3  pH  to  15.6  pH,  i.e.,  P^/(2rc)  varies  from  0.125  to 
1.5,  the  one-stage  JTL  delay  td  changes  from  0.99  ps  to  6.26  ps,  i.e.,  from  1.8  x0  to  11.4  x0.  The 
pulse  FWHM  first  increases  from  2.12  ps  to  2.26  ps,  i.e.,  3.9  Xq  to  4.1  xq  with  Ls  increasing  from 


FWHM/t0  td/x0  FWHM  (ps),  td  (ps) 
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Figure  2.4  Simulation  of  the  50-stage  Josephson  ring  oscillator  in  Fig.  2.2.  Ic  -  0.2  mA;  lcR 
=  0.6  mV,  x 0  =  0.55  ps;  pc  =1 ;  Ls  -  5.2  pH,  (3l/(2ji)  =  0.5.  (a)  The  SFQ  pulse 
FWHM  and  the  one  stage  JTL  delay  td  vs.  the  dc  bias  level  iyic.  (b)  FWHM/x0 
and  tdix0  vs.  Ibllc. 
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Figure  2.5  Simulation  on  a  50-stage  Josephson  ring  oscillator  in  Fig.  2.2.  Ic  =  0.2  mA,  lb  = 
0.14  mA;  lcR  =  0.6  mV,  x0  =  0.55  ps;  pc  =  1.  (a)  The  SFQ  pulse  FWHM  and  one 
stage  JTL  delay  td  vs.  Ls.  (b)  FWHM/x0  and  td/x0  vs.  Pl/(2ti). 
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Figure  2.6  200-stage  JTL.  All  the  stages  are  identical,  including  lc,  dc  bias  lb,  inductance  Ls 
and  shunt  condition  [3C. 


1.3  pH  to  5.2  pH,  i.e.,  \Sl/{2k)  changing  from  0.125  to  0.5.  Then  it  starts  to  decrease  from  2.26  ps 
to  1.81  ps,  i.e.,  4.1  x0  to  3.3  Tq  whenfr^  continues  to  increase  from  5.2  pH  to  15.6  pH,  i.e.,  P^/(2tc) 
from  0.5  to  1.5.  Although  for  a  JTL  itself,  Ls  is  usually  chosen  with  P //( 2rr)  around  0.5,  in  some 
other  circuits  the  inductance  values  could  be  larger,  such  as  the  storage  inductor  in  the  RS  flip-flop, 
which  has  a  value  of  P^/(2rc)  about  1.5,  so  we’ll  expect  it  causes  a  larger  delay.  We’ll  find  out  in 
the  next  simulation  that  the  delay  is  governed  by  Ls  in  the  same  way  as  the  minimum  time  interval 
for  two  consecutive  incoming  pulses  not  to  interfere  with  each  other.  It  is  the  pulse  width  com¬ 
bined  with  the  interaction  between  the  pulses  that  determines  the  circuit  speed.  We’ll  quote  some 
simulation  results  on  JTLs  [29]  reported  by  V.  K.  Kaplunenko  to  verify  this  point. 

Shown  in  Fig.  2.6  is  a  200-stage  JTL  in  which  all  stages  are  identical,  including  the  junction 
critical  current  7C,  bias  current  Ib,  inductance  Ls  and  the  shunt  condition  J3C.  Study  shows  that  if  the 
interval  between  two  incoming  SFQ  pulses  is  less  than  a  certain  value  ts,  the  two  pulses  will  expel 
each  other  while  they  propagate  through  the  JTLs  until  the  saturation  interval  value  ts  is  reached. 
So  the  JTLs  can  only  operate  correctly  at  a  speed  up  to  1  !ts,  otherwise  the  timing  information  car¬ 
ried  by  the  pulses  won’t  be  retained.  The  curves  in  Fig.  2.7  shows  the  time  separation  between  the 
two  pulses  vs.  the  junction  number  as  they  propagate  along  the  array  for  various  initial  delays 
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Figure  2.7  Pulse  interval  during  the  propagation  in  a  JTL  array  of  200  junctions  with  differ¬ 
ent  initial  delay  between  the  two  pulses.  Ls  =  7.8  pH,  lb  =  0.1  mA,  lc  =  0.125  mA,  R 
-  2  Q,  pc  =  0.  After  Fig.  3  in  [29]. 

between  them.  As  we  can  see,  as  long  as  the  delay  between  the  two  pulses  is  less  than  27.1ps,  the 
two  pulses  will  keep  expelling  each  other  until  the  delay  reaches  27. 1  ps.  For  curves  with  initial 
delay  larger  than  27.1  ps,  the  delay  between  the  two  pulses  will  remain  stable  during  the  pulse 
propagation.  So  for  this  example,  the  value  of  the  saturation  time  ts  is  27.1ps.  Flere,  the  bias  level 
is  4//c  =  80%,  pc  =  0,  Pl/(2tt)  =  0.5,  IJi  =  0.25  mV,  xq  =  ®o/(27t/,J?)  =  1.32  ps,  so  1  /ts  is  about 
0.3(/cR/®0),  or  l/(20x0).  JTLs  are  used  for  interconnections  broadly  in  RSFQ  circuits;  its  speed 
will  set  an  upper  limit  of  the  speed  of  the  RSFQ  circuits.  Considering  a  more  general  case  of  70% 
dc  bias  level  and  p  c  =  1 ,  1/(25xq)  is  a  better  estimate  of  the  speed  limit  of  RSFQ  circuits. 

Simulations  are  also  done  to  check  how  the  saturation  time  ts  changes  with  the  parameters  Pc, 
Ls  and  dc  bias  level  lb/lc.  It  was  found  variation  of  Pc  has  a  very  small  affect  on  ts,  causing  less  than 
1 0%  change  of  ts  with  Pc  varying  from  0  to  1 ,  which  is  consistent  with  the  small  effect  of  Pc  on  the 
pulse  width  and  one-stage  JTL  delay  as  we  discussed  previously.  The  trend  of  ts  vs.  Ib/Ic  and  Ls 
also  agrees  with  what  we  found  earlier  on  the  pulse  width  and  the  one-stage  JTL  delay.  We  have 
extracted  the  data  of  ts  from  Fig.  4  and  Fig.  5  of  Kaplunenko’s  paper  and  plot  the  normalized  tjxb 
for  Pc  =  0  together  with  the  normalized  pulse  FWHM/x0  and  one-stage  JTL  delay  Q/x0  we  calcu- 
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Figure  2.8  Normalized  saturation  time  y t0,  pulse  FWHM/t0  and  one-stage  JTL  delay 

fd/x0  vs.  Ibllc.  pc  =  0  for  the  calculation  of  yx0  and  pc  =  1  for  the  calculation  of 
FWHM/t0  and  tdlx0.  pL/(2n)  =  0.5  for  all  three  cases. 


lated  earlier  vs.  ib/ic  in  Fig  2.8.  And  we  plot  the  normalized  tj t0  with  Pc  =  0 ,  lb/lc  =  0.8  together 
with  FWFIM/x0  and  tj Tq  with  PL.  =  1 ,  iyic  =  0.7  vs.  PL/(27t)  in  Fig.  2.9.  We  can  see  from  Fig.  2.8, 
ts  reduces  from  33tq  to  19tq  when  lbllc  increases  from  0.5  to  0.9.  At  70%  dc  bias  level,  ts  is  about 
23t0.  With  the  10%  increase  when  Pc  changes  to  1,  ts  is  about  25t0.  This  is  because  both  td  and 
pulse  FWF1M  reduce  with  lb/lc.  From  Fig.  2.9,  we  can  see  ts  is  increasing  almost  linearly  with  the 
increase  of  PL,  or  Ls,  following  the  trend  of  td  while  the  FWF1M  almost  remains  constant.  Not  only 
the  SFQ  pulse  width  but  also  the  interaction  between  the  pulses  determines  the  speed  of  the  circuit. 
It  would  be  easier  to  understand  the  dynamics  with  the  aid  of  the  pendulum  analog.  Picture  the 
JTLs  as  the  pendulums  connected  by  the  torsion  springs  as  shown  in  Fig.  2.10.  The  pendulums  are 
the  analogs  of  the  junctions  and  the  torsion  springs  are  the  analogs  of  the  inductors  connecting  the 
junctions  in  the  JTLs.  The  larger  inductance  value  in  the  JTL  is  equivalent  with  the  looser  springs 
connecting  the  pendulums.  The  time  it  takes  for  a  pendulum  to  flip  once  is  an  analog  to  the  SFQ 
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Figure  2.9  Normalized  saturation  time  t^x 0>  pulse  FWIHM/tq  and  one-stage  JTL  delay 

fd/x0  vs.  pL i(2n).  (3C  =  0,  /b//c  =  0.8  for  the  calculation  of  t<Jx0  and  pc  =  1 ,  /b//c  = 
0.7  for  the  calculation  of  FWHM/t0. 
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Figure  2.10  A  pendulum  analog  for  a  3-stage  JTLs.  Each  pendulum  is  the  analog  of  a 

junction.  And  the  torsion  springs  connecting  the  pendulums  are  the  analogs  of 
the  inductors  connecting  the  junctions  in  the  JTLs. 


pulse  FWHM  in  the  JTLs.  All  three  pendulums  are  initially  lifted  to  an  angle  0  away  form  the  ver¬ 
tical  line  in  a  surface  represented  by  the  dotted  circle  perpendicular  to  the  axis  along  which  the 
springs  lie.  With  an  appropriate  kick  applied  to  the  first  pendulum,  it  will  rotate  around  the  axis  by 
360  degrees  and  reset  to  its  initial  position.  Then  the  torsion  in  the  first  spring  will  fire  the  rotation 
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of  the  second  pendulum,  so  inducing  a  torsion  in  the  second  spring  to  fire  the  third  pendulum.  So 
the  disturbance  is  propagated  along  the  stages.  The  torsion  in  the  first  spring  will  die  down  after  a 
few  stages  of  pendulums  reset.  If  we  want  to  pass  two  kicks  along  the  stages  without  interfering 
with  each  other,  we  would  apply  the  second  kick  after  a  few  stage  delays  until  the  motion  in  the 
first  spring  dies  down.  The  stiffer  the  springs  are,  the  faster  the  disturbance  is  propagated.  The 
faster  the  pendulum  flips,  the  larger  torque  is  applied  to  the  spring,  so  the  faster  the  next  pendulum 
is  fired.  Back  in  the  JTLs,  the  smaller  the  inductance  Ls  is  and  the  higher  ICR  is,  the  shorter  is  the 
one-stage  delay  and  the  smaller  the  minimum  interval  ts  between  two  incoming  pulses. 

2.1.2  Dependence  of  lcR  on  Jc  in  Low-Tc  Niobium  Process 

The  low-Tc  Nb/A10x/Nb  tunnel  junction  has  a  very  hysteretic  I-V  characteristics  as  shown  in 
Fig.  1.4.  To  be  used  in  RSFQ  circuits,  a  tunnel  junction  is  shunted  with  an  external  resistance  to 
make  |3C  =  1  in  order  to  have  a  nonhysteretic  I-V  characteristics.  Recalling  the  expression  for  Pc 
in  Eq.(1.7),  we  can  rearrange  it  as 


L.R  = 


Pc®0  Jc 

"  C, 


2n 


(2.4) 


where  Jc  is  critical  current  density  and  Cs  is  specific  capacitance  of  the  junction  and  R  is  the  total 
resistance  of  the  external  shunt  resistance  Rex  in  parallel  with  the  junction  subgap  resistance  i?sub. 
Jc  increases  exponentially  with  the  reduction  of  the  barrier  thickness  while  Cs  increases  linearly. 
As  seen  in  Fig.  1.3,  when  Jc  increases  10  times  from  1  kA/cirr  to  10  kA/cirr,  Cs  increases  only  by 
1.26  times  from  50  fF/pm  to  63  fF/pm  .  So  we  can  almost  treat  Cs  as  a  constant  value  when  Jc  is 
varied.  With  (3r  =  1 ,  a  constant,  we  can  make  the  approximation 


W*  JJc 


(2.5) 
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So  for  the  niobium  tunnel  junctions  we  use  in  RSFQ  circuits,  the  higher  Jc,  the  higher  1CR,  and  the 
faster  the  circuits. 


In  the  actual  calculation,  the  Cs  value  from  Fig.  1.3  is  used  in  the  junction  model,  so  the  depen¬ 
dence  of  Cs  on  IJi  is  also  counted.  From  Eq.  (2.4),  with  fr,  =  1 ,  we  have 


ICR  =  1.  815  (mV)  (2.6) 

9  9 

where  Jc  is  in  unit  of  kA/cm  and  Cs  is  in  unit  of  fF/pm .  For  the  two  process  we  used  for  our 
designs,  the  Jc  values  are  1  kA/cm  and  6.5  kA/cm  .  with  Cs  equal  to  50  fF/pnr  and  61  fF/pnr, 
respectively,  so  the  values  of  ICR  are  0.257  mV  and  0.592  mV.  The  junction  models  used  in  the 
WRspice  simulation  are  listed  below. 

.model jjmodlkjj(rtype=l,  cct=l,  icon=10m,  vg=2.8m,  delv=0.3m, 

+  icrit=0.1m,  r0=300,  rn=26,  cap=0.5p) 

.model jjxlkjj(rtype=l,  cct=l,  icon=10m,  vg=2.8m,  delv=0.3m, 

+  icrit=0.1m,  r0=2.57,  rn=2.36,  cap=0.5p) 

*  Nb  1  kA/cm2,  area= 10  square  microns 

.model jjmod6k5 jj(rtype=l,  cct=l,  icon=10m,  vg=2.8m,  delv=0.3m, 

+  icrit=0.1m,  r0=300,  rn=26,  cap=0.094p) 

.model jjx6k5  jj(rtype=l,  cct=l,  icon=10m,  vg=2.8m,  delv=0.3m, 

+  icrit=0.1m,  r0=5.92,  rn=4.9,  cap=0.094p) 

*  Nb  6.5  kA/cm2,  area=1.538  square  microns 

jjmodlk  is  the  model  for  a  tunnel  junction  with  Jc  of  1  kA/cm  .  For  Ic  =  0.1  mA,  the  junction 
has  an  area  equal  to  10  pm-,  subgap  resistance  Rsu|->  =  300  Q,  and  the  normal  resistance  Rn  =  26  Q 
capacitance  C  =  0.5  pF .  jjxlk  is  the  model  for  the  shunted  junction.  An  external  shunt  resistance 
Rex  =  2.59  Q  paralleled  with  junction  internal  resistance  will  give  the  new  Rsub  =  2.57  Q,  Rn  = 
2.36  Q.  The  switching  of  the  shunted  junction  is  happening  in  the  subgap  region.  So,  ICR  =  0.257 


mV. 
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1.  w /  alternating  Is  and  Os,  1  kA/cmA2 

2.  w/  alternating  Is  and  Os,  6.5  kA/cmA2 

— »—  3.  w/  all  Is,  6.eTkA/cmA2 _ 


Figure  2.11  DC  bias  margins  vs.  frequency  for  the  T  flip-flop  shown  in  Fig.  1.11  with  Jc  of 
1  kA/cm2  and  6.5  kA/cm2  and  different  input  data  patterns. 

jjmod6k5  is  the  model  for  a  tunnel  junction  with  Jc  of  6.5  kA/cm  .  For  Ic  =  0. 1  mA,  the  junc- 
tion  has  an  area  equal  to  1.538  pm  ,  subgap  resistance  Rsub  =  300  D,  and  the  normal  resistance  Rn 
=  26  Q,  capacitance  C  =  0.094  pF ,jjx6k5  is  the  model  for  the  shunted  junction.  An  external  shunt 
resistance  Rex  =  6.04  Q  will  give  the  new  Rsub  =  5.92  Q  Rn  =  4.9  Q  So,  ICR  =  0.592  mV. 

Using  the  estimation  1/(25tq)  =  27rIcR/(25O0)  =  121.4  ICR  GFlz,  where  ICR  is  in  the  unit  of 
mV,  we  estimate  the  maximum  circuit  speed  in  the  1  kA/cm"  and  6.5  kA/cm"  niobium  process  is 
31  GFlz  and  72  GFlz,  respectively.  For  more  complicated  circuits  the  maximum  speed  will  be 
lower  than  these  numbers.  Shown  in  Fig.  2.11  is  the  dc  bias  margins  vs.  frequency  for  the  T  flip- 
flop  shown  in  Fig.  1.11.  For  all  three  conditions,  the  circuit  dc  bias  margins  keep  constant  up  to  a 
certain  frequency;  then  the  lower  margin  starts  to  reduce  with  the  frequency.  The  turning  point  (see 
Fig.  2.1 1)  corresponds  to  the  frequency  when  the  pulses  in  the  circuits  start  to  interfere  with  each 
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(a)  (b) 

Figure  2.12  Simulation  of  the  T  flip-flop  shown  in  Fig.  1.11  with  Jc  =  6.5  kA/cm2.  (a)  correct 
operation  at  100  GHz.  (b)  erroneous  operation  at  200  GHz. 


other.  Higher  dc  bias  makes  the  pulse  width  narrower.  At  frequencies  above  the  turning  point,  the 
optimum  dc  bias  increases  to  accommodate  the  shorter  period. 

Fig.  2.12  shows  a  comparison  of  correct  operation  at  100  GHz  and  erroneous  operation  at  200 
GHz  of  the  T  flip-flop  with  Jc  of  6.5  kA/cm  .  At  200  GHz,  for  both  input  and  outputs,  the  pulses 
repel  each  other,  the  interval  between  the  consecutive  pulses  is  expanded,  and  the  position  of  0s 
are  occupied  by  pulses  now.  We  can  easily  see  it  is  the  interference  between  the  pulses  that  causes 
the  failure  of  the  circuit.  With  the  input  data  pattern  shown  in  Fig.  2.12,  the  dc  margins  of  the  T 
flip-flop  start  to  decrease  above  20  GHz.  The  circuit  works  up  to  a  frequency  above  66  GHz  with 
Jc  of  1  kA/cnr  as  shown  in  Fig.  2.1 1.  As  a  comparison,  the  dc  margins  of  the  T  flip-flop  made  with 
Jc  of  6.5  kA/cnr  start  to  decrease  above  50  GHz  but  continues  to  work  up  to  a  frequency  of  167 
GHz.  With  an  input  data  pattern  of  all  1  s,  the  circuit  dc  bias  margins  start  to  decrease  at  a  higher 
frequency  of  80  GHz,  and  continues  to  work  up  to  208  GHz  with  Jc  of  6.5  kA/cm-.  This  is  because 
in  this  specific  data  pattern,  a  pulse  gets  repelled  from  both  sides,  so  the  effect  of  the  pulse  interfer- 
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ence  on  timing  is  reduced.  The  case  with  an  input  pattern  of  all  Is  corresponds  to  the  much 
reported  direct  high-speed  testing  results  on  T  flip-flops;  where  an  input  junction  is  overbiased  to 
generate  continuous  Is  as  input,  and  average  dc  voltages  across  an  input  junction  and  an  output 
junction  are  measured  to  compare  the  input  frequency  and  the  output  frequency  since  the  average 
voltage  across  a  junction  is  proportional  to  the  pulse  frequency,  V  =  4>0/.  Table  2-1  lists  the 


TABLE  2-1  Reported  T  flip-flop  speed  vs.  Jc,  and  the  minimum  junction  size  amin. 


Process 

Jc  (kA/cm2) 

amin  (hm) 

Speed  (GHz) 

Hypres 

1 

3.0 

120 

Hypres 

5 

1.75 

220 

SUNY 

6 

1.5 

240 

SUNY 

50 

0.25 

770 

reported  T  flip-flop  speed  vs.  Jc  of  the  process  in  which  the  circuit  is  implemented  [20]  [21],  We 
can  see  the  circuit  speed  is  roughly  proportional  to  Jc  .  Notice  for  the  SUNY  6  kA/cm  process, 
chemical  mechanical  polishing  is  used  to  help  the  lithography  to  define  small  junction  area  better. 
For  the  SUNY  50  kA/cm  process,  E-beam  writing;  which  is  not  suitable  for  larger  circuits,  is  used 
to  define  the  junctions  instead  of  photolithography  due  the  small  size  of  the  junction,.  The  mini¬ 
mum  size  of  the  junctions  is  discussed  in  the  section  below.  As  we  discussed  earlier,  the  speed 
tested  in  this  way  is  overly  optimistic  compared  to  the  case  where  more  complicated  data  patterns 
are  fed  to  the  circuit.  Also,  for  a  realistic  circuit  operation  speed,  we  want  the  circuit  to  operate  at  a 
frequency  below  the  turning  point,  so  that  the  circuit  has  large  dc  bias  margins.  Compared  to  our 
simulated  speed  of  208  GHz  at  6.5  kA/cm  ,  the  reported  speed  240  GHz  at  6  kA/cm-  is  slightly 


higher  possibly  because  of  the  difference  between  the  actual  and  design  parameters. 
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2.1.3  Junction  Size  Limitation 

When  we  decide  on  the  junction  Ic  level,  there  are  limitations  and  trade-offs.  First,  since  the 
power  consumption  is  proportional  to  the  Ic  of  the  junctions,  we  want  to  keep  the  Ic  level  as  low  as 
possible.  The  power  consumption  of  RSFQ  circuits  includes  two  parts,  static  power  dissipated  in 
the  bias  resistors  and  dynamic  power  dissipated  in  the  junctions  during  the  junction  switching.  The 
voltage  across  the  junction  is  zero  except  during  its  switching,  so  for  static  power,  the  voltage  drop 
across  the  resistor  is  the  full  bias  voltage  Vb.  For  each  junction,  the  static  power  is 

P static  =  IbVb={Ib/Ic)IcVb  (2.7) 

where,  lb/lc  is  the  dc  bias  level.  For  each  switching,  the  junction  consumes  energy 
E  =  J  Ic  V(  t)  dt  =  IcO0 ,  where  V(t)  is  the  SFQ  pulse  voltage  across  the  junction.  So  for  each 
junction,  the  dynamic  power  is 


P dynamic  I 0?  (2-8) 

Here  f  is  the  clock  frequency  of  the  circuit,  and  Pdynamic  increases  with  the  clock  frequency  f.  If  we 
insert  some  typical  parameters  from  our  designs,  lc  =  250  pA,  lb/lc  =  0.7,  Vb  =  5.75  mV,  and  f  =  50 
GHz,  we  get  Pstatic  =  1  pW  and  Pdynamic  =  26  nW,  about  40  times  smaller.  The  static  power  is  the 
dominating  one.  But  both  Pstatic  and  Pdynamic  are  proportional  to  lc.  So  lower  lc  is  favored  for 
reducing  circuit  power  consumption. 


On  the  other  hand,  it  requires  that  lc  stays  above  a  certain  level  to  overcome  thermal  noise.  The 
junction  coupling  energy  is  Ec  =  (7z/c/ 2e)cos(|>,  and  the  thermal  noise  energy  is  proportional  to 
kBT.  Detailed  analyses  [30]  show  that  to  achieve  bit  error  rate  less  than  T,  lc  should  satisfy 


4>o  2nTx0 


(2.9) 
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For  a  reasonably  low  bit  error  rate,  Ft0  <  10  ,  temperature  T  =  4.2  K,  Ic  should  not  be  less  than 

50  pA.  During  switching,  the  effect  of  fluctuations  is  even  more  severe,  so  the  minimum  Ic  is  usu¬ 
ally  taken  above  100  pA  [3].  We  use  120  pA  as  the  minimum  Ic  in  our  designs.  So  the  minimum 
junction  size  amin  =  JVMiiK/  Jc  assuming  a  square  junction.  For  Jc  =  1  kA/cnr,  amin  =  3.5  pm. 
For  Jc  =  6.5  kA/cnr,  amin  =  1.4  pm. 


When  junction  size  is  larger  than  a  few  times  of  Josephson  penetration  depth  A.j,  Ic  of  the  junc¬ 
tion  will  stop  increasing  with  the  junction  area.  So  we  use  Aj  as  the  maximum  allowed  junction 
size. 


X  =  I  ^  (2.10) 

J  ^2n[v0{2X+ d)Jc 

where  X  is  the  magnetic  penetration  depth,  d  is  the  barrier  thickness,  p0  =  1.  26  pH/m  is  the  per¬ 
meability  of  free  space  (and  can  be  used  for  nonmagnetic  materials  with  good  accuracy).  Taking 
typical  values  X  =  90  nm,  d  =  1  nm,  amax  -  Xj&  J(  1500  pA )/Jc.  So  amax/  amin  ~  3.  5  and 
Icmax/ 1  cm  in  ~  12  .  The  ratio  is  large  enough  for  the  typical  Ic  values  in  RSFQ  circuits. 

For  the  designs  in  this  thesis,  we  used  two  different  processes,  the  commercially  available 
HYPRES  1  kA/cm  and  UCB  high-Jc  6.5  kA/cnr  Nb  process.  Using  the  discussion  above,  we  can 
summarize  the  main  parameters  for  the  circuits  in  Table  2-2. 

9  9 

TABLE  2-2  Key  parameters  for  RSFQ  circuits  in  the  1  kA/cm  and  6.5  kA/cm-  Nb  process. 


Key 

parameters 

Hypres 

Present 

UCB 

High  Jc 

Jc  (kA/cm2) 

1 

6.5 

amin  (hm) 

3.5 

1.35 

4*  (mV) 

0.257 

0.592 

fmax(GHz) 

30-40 

70-100 
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Considering  the  process  variations,  we  chose  to  design  20  GHz  circuits  in  the  1  kA/cirr  pro- 

9  9 

cess  and  50  GHz  in  the  6.5  kA/cm~  process.  The  1.  35  x  1.  35  pm  junction  is  achievable  yet  chal¬ 
lenging.  It  was  chosen  as  the  smallest  for  which  we  had  reliable  spread  data. 

2.2  UCB  High-Jc  Niobium  Process 

In  this  section,  we  will  briefly  introduce  the  UCB  high-Jc  niobium  process  [22] [26] [27]  from  a 
designer’s  point  of  view.  The  success  of  the  comeback  of  the  superconductor  digital  IC  after  the 
closedown  of  the  IBM  superconductor  supercomputer  project  is  largely  credited  to  the  establish¬ 
ment  of  the  Nb-based  junction  process  to  replace  the  Pb-based  junction  used  in  the  project.  Unlike 
the  lead-based  junction,  which  suffers  from  aging  effects,  the  Nb-based  junction  is  very  stable  over 
the  time. 

The  UCB  Nb  process  has  10  masks  and  12  layers.  Fig  2.13  shows  a  schematics  of  the  cross 
section  of  the  process.  As  we  can  see  in  Fig.  2.13,  a  tunnel  junction  can  be  formed  by  a  sandwich 
structure  Nb(CE)/A10x/Nb(BE).  The  bottom  Nb  is  called  base  electrode  (BE)  and  the  top  Nb  is 
called  counter  electrode  (CE).  The  junction  area  is  determined  by  the  size  of  the  CE.  Notice  the 
barrier  thickness  listed  above  is  actually  the  thickness  of  the  Al.  Only  a  very  thin  layer  on  the  top 
of  the  Al  is  oxidized  to  form  the  barrier  thickness.  Then  barrier  thickness  can  be  adjusted  through 
oxidation  to  give  different  Jc  values.  A  typical  thickness  of  the  A10x  is  1  nm.  The  highest  Jc 
achieved  for  the  UCB  Nb  process  is  26  kA/cm~. 

Table  2-3  lists  the  materials,  thickness  and  the  process  methods  for  each  layer  and  the  order  of 
the  layers  is  from  bottom  to  top  according  to  the  process  flow.  Insulator  I  and  insulator  II  share  one 
mask  and  etching  step.  Junction  counter  electrode  and  anodization  share  one  mask. 
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Figure  2.13  Cross  section  of  UCB  Nb  integrated  circuit  process  (not  to  scale). 
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TABLE  2-3  UCB  Nb  1C  process  flow 


Layer 

Material 

Thickness  (A) 

Process 

Method 

Ground  plane 

Nb 

1000 

dc  sputtering  and  RIE 

Insulator  (I) 

Si02 

1500 

ECR  PECVD  and  RIE 

Base  electrode 

Nb 

2000 

dc  sputtering  and  RIE 

Barrier 

Al/A10x 

90(A1) 

dc  sputtering  and 

thermal  oxidation 

Counterelect. 

Nb 

600 

dc  sputtering  and  RIE 

Insulator  (II) 

Si02 

1000 

ECR  PECVD  and  RIE 

Resistor 

Pd 

400-800 

E-beam  evaporation 

Insulator  (III) 

Si02 

1000 

ECR  PECVD  and  RIE 

Wire  (I) 

Nb 

3000 

dc  sputtering  and  RIE 

Insulator  (IV) 

Si02 

5000 

ECR  PECVD  and  RIE 

Wire  (II) 

Nb 

6000 

dc  sputtering  and  RIE 

Contact  pads 

Al/Ti/Au 

100/100/2000 

E-beam  evaporation 

and  lift-off 

A  few  characteristics  enable  the  UCB  Nb  process  to  produce  high  quality  small  junctions  with 
small  critical  current  spreads.  First,  a  10:1  wafer  stepper  is  used  for  lithography.  Second,  high  pre¬ 
cision  E-beam  mask  is  used  for  the  junction-definition  layer  [28].  On  the  mask,  maximum  varia¬ 
tion  is  controlled  below  0.05  pm.  With  the  10:1  reduction,  the  variation  caused  by  mask  only 
would  be  0.005  pm  on-chip,  which  is  1%  area  error  for  a  1  pm  junction.  Third,  light  anodization 
is  done  in  a  ring  area  surrounding  junctions  as  shown  in  Fig.  2.13.  Our  understanding  is  that  this 
serves  three  functions.  The  Nb  CE  and  the  thin  barrier  experience  some  degradation  during  the 
RIF,  etching,  causing  the  critical  current  density  on  the  edge  to  be  reduced.  This  reduction  can’t  be 
well  controlled,  producing  a  large  Ic  variation  among  junctions.  Anodization  oxidizes  this 
degraded  thin  layer  along  the  edge  of  junctions,  greatly  reducing  the  spreads  of  the  junction  7C.  At 
the  same  time,  the  anodized  layer  is  a  good  insulating  layer  to  prevent  leakage  current  from  the  CE 
to  BE  which  might  exist  through  the  pinholes  in  the  SiCU  layer  at  the  edge  of  the  junction  or 
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Anodization  ring  Nb  Wire  (M2)  Contact  Window  Junction  CE 


Figure  2.14  SEM  photos  of  a  0.3  pm2  high  Jc  junction,  (a)  The  junction  with  wiring,  (b) 

Enlarged  image  of  the  junction  CE  and  the  contact  window. 

through  the  degraded  A10x,  thus  producing  high  quality  tunnel  junctions.  For  the  small  junctions 
in  the  high  Jc  process,  the  junction  size  is  typically  less  than  2x2  pm  .  We  may  want  to  use  a  con- 
tact  hole  for  the  CE  with  size  equal  or  larger  than  2x2  pm  .  So  the  size  of  the  contact  hole  is  actu¬ 
ally  larger  than  the  size  of  the  CE  itself,  which  is  only  possible  with  the  insulation  of  the 
anodization  layer.  Fig.2.14  shows  SEM  photos  of  a  0.3  pm  junction.  Notice  the  contact  window 
to  the  CE  is  actually  larger  than  the  CE  and  the  entire  contact  window  outside  the  CE  is  sitting  in 
the  anodization  ring  area.  So  the  upper  wiring  can  only  contact  the  CE,  insulated  from  the  BE. 

Fig.  2.15a  shows  the  7-V  characteristics  of  the  0.3  pm"  junction  with  Jc  =  12  kA/cm  .  We  can 
see  that  even  with  such  a  small  size,  the  junction  still  retains  a  good  tunnel  junction  7-V  character¬ 
istics.  Vm  =  12  mV,  which  gives  large  enough  subgap  resistance  to  be  ignored  when  the  junction  is 
shunted  by  a  small  external  resistance  of  a  few  ohms.  That  is  why  the  exact  value  of  the  subgap 
resistance  r0  is  not  important  in  the  junction  models  which  we  presented  in  Sec.2. 1.2. 
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(a)  (b) 


Figure  2.15  i-V  characteristics  of  high-Jc  junctions,  (a)  the  0.3  pm2  junction  shown  above,  Jc 
=  12  kA/cm2,  Vm  =  12  mV.  (x-axis:  1  mV/div,  y-axis:  50  (aA/div)  (b)  50  series  junc¬ 
tions,  the  junction  size  is  1.  5  x  1.  5  pm2,  Jc=  12  kA/cm2,  Jc  spread  is  1%.  (x-axis: 

50  mV/div,  y-axis:  200  pA/div). 

Fig.  2.15b  shows  the  /-V  characteristics  for  a  50-junction  series  array.  The  junction  size  is 
1.  5  x  1.  5  pm2,  Jc  =  12  kA/cm2.  The  critical  current  spread  (minimum  to  maximum)  is  only  1%.  This 
spread  doesn’t  consider  the  run-to-run  and  chip-to-chip  variations.  A  more  realistic  state  of  art  7C 
spread  is  2%  (la)  on  junctions  with  size  down  to  1.  5  x  1.  5  pm"  reported  by  TRW  [23]  after  they 
adopted  the  anodization  approach  in  their  process. 

Another  uniqueness  of  the  UCB  Nb  process  is  the  low-temperature,  low-stress  ECR  PECVD 
Si02  process  for  junction  insulation.  Since  the  ECR  microwave  plasma  has  a  much  higher  density 
and  a  very  low  ion  energy  compared  to  the  traditional  RF  plasma,  the  ECR  PECVD  system  can 
deposit  Si02  at  a  high  deposition  rate  and  a  low  substrate  temperature  with  very  small  damage  to 
surfaces.  As  a  result,  the  insulation  quality  of  the  Si02  layer  is  better.  Uniformity  of  the  layer  is 
also  improved.  And  junctions  experience  much  less  damage  because  of  the  low  stress  and  the  low 


substrate  temperature. 
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The  knowledge  of  the  process  flow  and  the  thickness  of  layers  are  used  for  inductance  calcula¬ 
tion.  And  we  usually  connect  the  wire  II  (M3)  layer  with  the  ground  plane  through  vias  to  form 
double  ground  planes  to  reduce  the  inductance  value  per  unit  length  for  inductors  implemented  by 
Mj  or  M-).  The  trilayer  Nb/A10x/Nb  can  be  used  as  wire  beyond  the  junction  area.  We  call  it  Mj  in 
that  case. 

Sheet  resistance  of  the  resistor  layer  can  be  adjusted  through  the  layer  thickness.  It  is  1  ohm 

9  9 

per  square  for  the  1  kA/cnr  process  and  2.3  ohms  per  square  for  the  6.5  kA/cm  process. 
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CHAPTER  3 

Design  and  Optimization  of  a 
Demultiplexer  and  a  Multiplexer 


3.1  Introduction 

Demultiplexers  (DEMUX)  and  multiplexers  (MUX)  are  useful  circuits  to  change  the  data  rate 
and  to  implement  conversion  between  serial  data  and  parallel  data.  Large  RSFQ  systems  are  usu¬ 
ally  composed  of  chips  mounted  on  a  multi-chip  module  (MCM).  The  connecting  solder  bumps 
limit  the  data  rate  from  chip  to  chip  [3 1][32].  On-chip  RSFQ  circuits  can  operate  up  to  several  tens 
of  gigahertz  in  the  current  technologies  and  have  potential  to  run  above  1 00  GFlz.  DEMUX  and 
MUX  circuits  can  be  used  to  change  the  data  rate  when  the  signals  go  between  chips  and  back  onto 
chips.  Due  to  the  maturity  of  the  semiconductor  circuits  in  digital  signal  processing  and  memory, 
hybrid  systems  such  as  an  RSFQ  analog-to-digital  converter  followed  by  VLSI  CMOS  digital  sig¬ 
nal  processing  circuits,  or  an  RSFQ  microprocessor  combined  with  hybrid  Josephson-CMOS 
memory  circuits,  are  proposed  and  researched  [33][34][35][36].  In  such  a  system,  DEMUX  and 
MUX  are  needed  as  interface  circuits  between  the  high-speed  RSFQ  circuits  and  the  lower-speed 
CMOS  circuits.  The  serial-to-parallel  converter  also  has  applications  in  arithmetic  logic  units 


(ALU)  and  special  purpose  hardware  such  as  fast  Fourier  transform  circuits  and  network  switches. 
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3.2  Architecture  Choice 

3.2.1  DEMUX 

Based  on  applications,  the  DEMUX  circuit  can  be  either  a  synchronous  or  an  asynchronous 
design.  There  are  mainly  two  types  of  architecture  adopted  in  the  synchronous  designs,  shift-and- 
dump  structure  and  binary  tree  structure.  In  a  shift-and-dump  structure  [37],  shown  in  Fig.  3.1a,  an 
N-bit  DEMUX  can  be  constructed  from  N-stage  modified  non-destructive-read-out  (NDRO)  shift 
registers.  All  N-bit  data  are  shifted  along  the  shift  registers  at  the  clock  rate;  then  a  read  signal  is 


Clock  ■ 


1/8 


Read 


Do  —  D5D6D7 
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NDRO 


T" 

d7 


NDRO 


T 

Dr 


T 

Dr 


NDRO - - 
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Do 


(a) 


D?  D3  D5  D-i  D6  D2  D4  D0 


Figure  3.1  Block  diagrams  of  two  synchronous  DEMUX  architectures,  (a)  an  8-bit  shift-and- 
dump  DEMUX  (b)  an  8-bit  binary  tree  DEMUX. 


Chapter  3:  Design  and  Optimization  of  a  Demultiplexer  and  a  Multiplexer 


54 


7@  2-bit  DEMUX 


_ Output0 

_ Output0 

—  Output4 
Output4 

^ _ Output2 

_  Output2 
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| —  Output  5 
—  Output5 
Output3 
i  Output3 
| —  Output  7 
Output7 


Figure  3.2  Block  diagram  of  an  asynchronous  1 :8  DEMUX  binary  tree  architecture. 


released  to  read  out  the  N  bits  of  data  simultaneously.  The  advantage  is  that  an  arbitrary  N-bit 
DEMUX  can  be  constructed  in  this  way.  The  layout  configuration  is  straight  forward.  The  draw¬ 
back  is  that  every  unit  has  to  operate  at  the  speed  of  the  input  signal  during  the  data  shifting.  The 
timing  between  the  clock,  data,  and  read  signals  is  intricate  since  the  delay  variations  of  the  clock 
and  read  signals  along  the  path  can  accumulate.  The  higher  the  speed  and  larger  the  number  of  bits, 
the  more  challenging  it  is  in  terms  of  timing  control.  In  the  binary  tree  structure  [38]  shown  in  Fig. 
3.1b,  an  8-bit  DEMUX  is  constructed  from  seven  2-bit  DEMUX  modules.  In  general,  a  2n-bit 
DEMUX  can  be  built  from  2n-l  2-bit  DEMUX  modules.  Only  the  module  on  the  top  of  the  tree  is 
operating  at  the  speed  of  the  input  data.  The  modules  at  each  step  down  operate  at  a  two-fold 
reduced  speed.  At  the  bottom  of  the  tree,  the  modules  operate  at  l/2n_1  of  the  input  speed. 

We  design  a  1:8  DEMUX  based  on  the  asynchronous  binary  tree  architecture  [39] [40]  shown 
in  Fig.  3.2.  Compared  to  the  two  synchronous  architectures  above,  it  eliminates  the  complex  tasks 
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of  clock  generation  and  distribution.  And  it  retains  the  advantage  of  the  binary  tree  structure  of 
lowering  operation  speed  after  the  first  stage. 

3.2.2  MUX 

Several  architectures  for  MUX  circuits  are  compared.  Shown  in  Fig.  3.3a  is  a  load-and-shift 
8:1  MUX  architecture.  It  consists  of  eight  stages  of  identical  shift  registers  (SR).  Each  basic  cell  is 
a  one-stage  shift  register.  With  a  Load  pulse,  external  parallel  data  D0,  D1?  ...  DN  are  selected  by 
the  SRs  to  shift  to  their  outputs,  otherwise  the  output  from  the  previous  stage  is  selected.  So  every 
eight  high-speed  clock  cycles,  the  external  data  are  loaded  once.  Then  the  high-speed  clock  shifts 
all  the  remaining  seven  bits  of  data  from  left  to  right  serially.  The  high-speed  clock  rate  and  the 
output  data  rate  are  eight  times  the  input  data  rate.  Similar  to  the  shift-and-dump  DEMUX,  a  load- 
and-shift  MUX  has  the  advantage  that  an  arbitrary  N-bit  MUX  can  be  built  and  the  layout  configu¬ 
ration  is  straightforward.  But  every  basic  cell  in  this  architecture  needs  to  operate  at  the  output 
speed,  the  highest  data  rate  in  this  circuit.  Besides  the  timing  between  input  data  D0,Dj...Dn  and 
Clock,  the  timing  between  the  data  output  from  the  previous  stage  and  Clock,  and  the  timing 
between  Load  and  Clock  all  have  to  be  controlled  at  the  highest  data  rate.  The  design  of  the  basic 
cell  is  also  very  challenging.  The  possible  multi-loops  needed  in  the  basic  cell  due  to  the  complex¬ 
ity  of  its  function  could  limit  the  dc  bias  margin  to  a  very  small  value  at  high-speed. 

As  a  comparison,  shown  in  Fig.  3.3b  is  a  ripple  logic  8:1  MUX.  In  this  architecture,  no  load 
signal  is  needed.  Both  Clock]  and  Clock2  are  eight  times  the  input  data  rate.  There  is  a  delay 
between  Clockj  and  Clock2.  A  T  flip-flop  binary  tree  divides  Clockj  into  eight  clock  signals  equal 
to  the  input  data  rate,  but  with  their  phases  evenly  spaced.  One  phase  interval  equals  one  Clockj 
period.  So  the  8-bit  input  data  are  clocked  at  the  input  rate  but  with  eight  evenly  spaced  phases. 
When  they  ripple  through  and  are  combined  by  the  CB  networks,  the  parallel  input  data  are  con- 
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Dq  —  D5D6D7 


(a) 


Output  = 
D0’Di,  ...  D7 

Output  = 
Dq>Di,  ...  D7 


(b) 


Figure  3.3  Block  diagrams  of  two  8:1  MUX  architectures,  (a)  Load-and-shift  architecture. 

(b)  Ripple  logic  architecture. 

verted  to  the  serial  data  with  eight  times  higher  data  rate.  The  D  flip-flop  placed  after  the  CB  is  to 
recover  dual-rail  outputs  if  the  application  requires  it.  Otherwise  it  can  be  removed.  The  main 
advantage  of  this  architecture  is  that  only  one  TFF  at  the  top  of  the  tree,  one  CB  before  the  D  flip- 
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flop  and  the  D  flip-flop  need  to  operate  at  the  highest  data  rate.  The  key  of  this  design  is  to  balance 
the  delays  of  the  eight  clock-data  paths  tracing  from  Clock  ]  to  the  clock  inputs  of  the  eight  RS  flip- 
flops,  then  from  the  outputs  of  the  eight  RS  flip-flops  to  the  output  of  the  last  CB.  The  drawback  is 
that  only  2n  bit  MUX  circuits  can  be  constructed  this  way.  We  choose  to  build  an  8:1  MUX  based 
on  the  ripple  logic  architecture  because  the  timing  requirement  is  more  relaxed  and  the  compo¬ 
nents  are  simpler  than  for  the  other  architectures. 

3.3  Circuit  Factors  of  Merit 

The  factors  of  merit  in  the  MUX  and  DEMUX  design  includes:  speed,  yield,  dc  bias  margin, 
parameter  margins,  power,  and  area. 

Correct  functioning  at  the  targeted  operation  speed  is  the  first  thing  we  need  to  achieve  in  the 
design.  Circuits  are  verified  and  optimized  at  the  operation  speed.  As  discussed  in  Chap.  2,  the 
maximum  speed  of  RSFQ  circuits  is  proportional  to  the  junction  ICR  value,  which  in  turn  is  deter¬ 
mined  by  the  junction  critical  current  density.  We  chose  to  design  a  20  GHz  1:8  DEMUX  and  a  20 
GHz  8:1  MUX  for  HYPRES  1  kA/cm2  niobium  process  and  ported  them  to  UCB  1  kA/cm2  nio¬ 
bium  process  with  layout  modification.  A  50  GHz  1:8  DEMUX  and  a  50  GHz  8:1  MUX  are  also 
designed  for  the  UCB  6.5  kA/cm-  niobium  process.  At  such  high  operation  speed,  timing  is  espe¬ 
cially  important. 

Yield  is  another  important  factor.  Due  to  the  process  variations,  the  fabricated  circuit  parame¬ 
ters  are  not  the  same  as  the  designed  values.  Yield  is  defined  as  the  success  rate  of  a  large  amount 
of  fabricated  parts.  Circuits  must  be  designed  to  be  robust  enough  to  achieve  good  yield  in  spite  of 
the  randomly  spread  parameters.  Monte  Carlo  analysis  can  be  used  to  calculate  a  theoretical  circuit 
yield  based  on  the  process  variations. 
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Dc  bias  margin  is  defined  as  the  operational  dc  bias  voltage  range  assuming  all  the  circuit 
parameters  are  at  their  nominal  values.  The  nominal  dc  bias  voltage  of  the  20  GHz,  1  kA/cm 
design  is  2.5  mV.  The  one  for  the  50  GHz,  6.5  kA/cm"  design  is  scaled  to  5.75  mV.  In  a  large  sys¬ 
tem,  each  component  is  designed  to  have  a  large  dc  bias  margin.  So  when  the  components  are  put 
together,  the  circuits  can  still  work  with  a  common  dc  bias  voltage  with  a  certain  margin.  A  large 
dc  bias  margin  can  also  help  to  overcome  non-idealities  such  as  thermal  noise,  ground  bounce.  Dc 
bias  margin  can  be  evaluated  from  simulation  and  verified  in  testing. 

Parameter  margins  are  the  operational  ranges  of  the  parameters  assuming  one  parameter  is 
varying  while  the  other  parameters  are  kept  at  the  nominal  values.  The  purpose  to  design  with 
large  parameter  margins  is  to  allow  for  the  process  variations. 

The  power  consumption  in  RSFQ  circuits  include  two  parts,  the  static  power  and  the  dynamic 
power.  As  stated  in  Section  2.1.3,  the  powers  can  be  estimated  as  Pstatic  =  IbVb  =  (/A/ I^IcVb 
and  Pdynamic  =  /cO0/.  While  the  dynamic  power  scales  with  the  circuit  speed,  the  static  power 
does  not.  In  the  1  kA/cm"  design,  for  a  junction  with  Ic  =  250  pA,  Ib/Ic  =  0.7,  Vb  =  2.5  mV,  and  f  = 
20  GHz,  we  get  Pstatic  =  0.44  pW  and  Pdynamic  =  10  nW.  In  the  corresponding  6.5  kA/cm  design,  f 
=  50  GHz,  Vb  =  5.75  mV,  we  get  Pstatic  =  1  pW  and  Pdynamic  =  26  nW.  In  both  cases,  the  static 
power  dominates.  This  dominance  can  extend  to  a  few  hundred  gigahertz.  In  contrast,  the  power 
consumption  scales  up  with  the  increasing  circuit  operation  speed  in  CMOS  circuits.  Heat  dissipa¬ 
tion  is  a  bottleneck  issue  in  CMOS  technology  scaling.  Low  power  consumption  extending  to  a 
very  high  operation  speed  is  one  of  the  main  advantages  of  the  superconductor  RSFQ  circuits.  To 
reduce  the  power  consumption,  both  Ic  and  the  dc  bias  voltage  can  be  reduced.  The  minimum  Ic 
value  in  our  design  is  around  100  pA.  The  corresponding  junction  size  is  around  3  pm  x  3  pm  in  1 
kA/cm  process,  which  is  a  relatively  comfortable  target.  The  corresponding  junction  size  is  1.3 
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pm  x  1.3  pm  in  6.5  kA/cnr  process  (6.5  kA/cm"  was  chosen  because  good  spreads  were  already 
demonstrated  for  1.3  pm  x  1.3  pm  junctions  in  the  UCB  process).  The  commonly  used  dc  bias 
voltage  is  2.5  mV  for  the  1  kA/cnr  design  in  the  field.  We  used  5.75  mV  in  the  6.5  kA/cm~  design 
for  the  layout  convenience  to  port  the  1  kA/cnr  design.  The  shunt  resistance  for  the  same  junction 
in  the  6.5  kA/cm  process  is  increased  to  2.3  times  the  original  value  in  the  1  kA/cm  process  to 
keep  Pc  =  1.  Instead  of  changing  resistor  layout,  the  sheet  resistance  in  the  6.5  kA/cm"  process  is 
adjusted  to  2.3  times  of  that  in  the  1  kA/cm"  process.  So  to  keep  the  correct  dc  bias  current  values, 
the  dc  bias  voltage  is  increased  to  5.75  mV,  2.3  times  2.5  mV.  The  dc  bias  voltage  is  not  chosen  to 
minimize  the  power  consumption  in  the  current  6.5  kA/cm  design;  instead  it  is  chosen  for  the  con¬ 
venience  to  port  old  designs. 

Area  is  another  figure  of  merit  of  the  circuit.  In  our  design  and  layout,  we  focused  on  getting  a 
robust  working  circuit.  Circuit  area  is  not  a  focus  for  the  time  being. 

3.4  The  Design  Procedure 

A  typical  design  procedure  is  illustrated  in  the  flow  chart  in  Fig.  Fig.  3.4.  The  main  tasks 
include  schematic  capture,  pre-layout  simulation  and  optimization,  layout,  inductance  extraction, 
post-layout  simulation  and  optimization.  First  a  circuit  schematic  is  created  and  captured.  Then  a 
pre-layout  simulation  is  done  to  verify  the  circuit  function.  It  may  take  iterations  to  achieve  the 
correct  function.  Then  the  optimization  is  performed  to  increase  the  circuit  parameter  margins  and 
to  improve  the  circuit  yield.  Several  CAD  tools  can  be  employed  to  assist  the  optimization.  Margin 
analysis  and  Monte  Carlo  analysis  are  used  to  evaluate  the  circuit  performance.  The  optimization 
stops  when  the  circuit  performance  is  satisfying.  Layout  is  done  based  on  the  optimized  circuit 
parameters.  During  the  transformation  from  the  schematic  to  the  layout,  circuit  parameters  are 
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altered.  The  junction  sizes  change  to  the  closest  values  from  the  pre-drawn  junction  library.  The 
actual  inductance  values  and  the  parasitic  inductance  values  are  extracted.  With  the  new  circuit 
parameters,  post-layout  simulations  and  analyses  are  done  to  check  the  circuit  function  and  perfor¬ 
mance  again.  In  most  cases,  the  circuit  function  is  still  correct  but  the  circuit  performance  deterio¬ 
rates  with  the  addition  of  parasitic  inductances.  If  the  function  also  fails,  circuit  parameters  and  the 
layout  need  to  be  modified  until  the  post-layout  simulation  shows  the  function  is  correct.  Then 
post-layout  optimization  is  performed  to  improve  the  circuit  performance  until  satisfying  results 
are  achieved.  In  the  post-layout  optimization,  parasitic  inductances  are  included  and  constraints 
imposed  by  the  practical  layout  are  considered. 

The  CAD  tools  investigated  and  employed  in  our  design  include:  Xic[41]  for  schematic  cap¬ 
ture  and  layout;  WRspice  [41],  JSIM  [42],  JSPICE3  [41]  for  circuit  simulation  and  analysis;  WinS 
[43],  MALT  [44],  MJSIM  [45]  for  optimization;  Cadence  Virtuoso  layout  tool  for  layout; 
INDUCT  [42]  and  LMETER  [46]  for  inductance  calculation  or  extraction. 

Details  of  some  tasks,  analysis  methods  and  the  use  of  related  C  AD  tools  are  introduced  in  the 
following  sections. 

3.4.1  Schematic  Capture 

A  schematic  is  a  way  to  visually  describe  and  record  the  circuit  configuration  and  parameters. 
Both  Xic  and  WinS  can  be  used  for  schematic  capture  in  RSFQ  circuit  design.  But  WinS  is  mainly 
an  RSFQ  circuit  optimization  tool.  The  schematics  captured  in  WinS  can  only  be  simulated  in 
WinS,  and  only  resistively  shunted  junctions  (RSJs)  and  RSFQ  circuits  can  be  captured  and  simu¬ 
lated  in  WinS.  So  schematics  are  captured  in  Wins  as  part  of  the  optimization.  Compared  with 
Wins,  Xic  is  a  more  versatile  tool  for  IC  design.  Besides  Josephson  junctions,  inductors,  resistors, 
other  devices  such  as  transmission  lines,  mutual  inductors  and  MOSFETs  are  also  supported.  Vari- 
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ous  current  sources  and  voltage  sources  can  also  be  captured  to  set  up  simulations.  Both  tunnel 
junctions  and  resistively  shunted  junctions  can  be  used  in  the  circuits.  The  captured  schematics  can 
be  simulated  within  the  tools  by  calling  WRspice.  The  junction  models  can  be  modified  by  the 
users  to  facilitate  both  pre-layout  and  post-layout  simulation.  Furthermore,  a  SPICE  netlist  includ¬ 
ing  both  the  circuit  configuration  and  the  simulation  setup  can  then  be  exported  from  Xic. 

3.4.2  Circuit  Simulation 

The  state-of-art  superconductor  circuit  simulator  is  WRspice.  It  is  SPICE  based,  fully  incorpo¬ 
rating  Josephson  junction  devices.  It  has  many  features  needed  in  the  modern  superconductor  inte¬ 
grated  circuit  design.  It  is  the  main  simulation  tools  used  in  our  design  work.  Two  other  simulation 
tools  JSPICE3,  JSIM  are  used  as  the  simulation  engines  in  the  optimization  tools. 

3.4.2. 1  Functional  Check 

The  circuit  function  is  checked  in  the  simulations.  For  RSFQ  circuits,  usually  the  node  volt¬ 
ages,  the  phases  of  the  junctions,  and  the  current  flowing  through  the  inductances  are  monitored. 
The  circuit  function  can  be  checked  visually  from  the  plotted  signal  waveforms.  A  measurement 
statement  can  be  used  to  extract  various  information  such  as  timing,  power,  voltage,  current,  junc¬ 
tion  phase  etc.  The  information  can  then  be  analyzed  for  further  design  improvement.  A  control 
block  can  be  added  in  the  circuit  input  file  to  set  the  pass/fail  criteria  including  the  information 
obtained  from  the  measurement.  So  the  program  can  report  pass/fail  automatically  after  a  simula¬ 
tion  run. 

3.4.2. 2  Margin  Analysis 

There  is  a  built-in  function  in  WRspice  to  check  two-dimensional  operating  range.  This  can  be 
used  to  check  a  parameter  margin  handily.  Compared  to  the  margin  analysis  in  other  optimization 
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tools,  the  pass/fail  criteria  can  be  more  complicated  and  more  flexible,  so  the  circuit  function 
check  is  more  complete. 

3.4.2. 3  Monte  Carlo  Analysis 

Monte  Carlo  analysis  is  a  statistical  method  to  simulate  the  effect  of  process  variations  on  the 
circuit  function  and  performance.  There  are  global  process  variations  and  local  process  variations. 
The  global  process  variations  reflect  run-to-run,  wafer-to-wafer,  chip-to-chip  process  variations, 
while  the  local  process  variations  are  the  process  variations  within  the  same  chip.  Usually  the  glo¬ 
bal  variations  are  much  larger  than  the  local  variations.  For  a  specific  process,  measurement  data 
of  a  large  number  of  samples  are  gathered  to  get  the  standard  deviation  of  a  parameter, 

/  N  \  /  N  \ 

o  =  ^  (v/(-  x'f  / N.  xk  is  the  kth  measured  parameter  value,  x  =  ^  xk  / N  is  the  average 

in-=i  '  \t=i  ' 

value  and  N  is  the  total  sample  number  and  should  be  large.  For  global  variations,  xks  are  gathered 

from  different  runs,  different  wafers,  and  different  chips.  For  local  variations,  xks  are  from  the 
same  chip.  In  a  simulation,  a  circuit  parameter  is  generated  equal  to  ( nominal  value  *  pgi0bal  * 
guass(<7iocaj,  I ))  and  pgi0bal=  guass(agi0bai,l).  guass(a,l)  is  a  pseudo-random  number  generated 
by  the  simulator  based  on  its  Gaussian  probability  distribution  centered  at  1.0  and  with  standard 
deviation  a.  In  one  simulation  run,  each  time  guass(a,l)  is  called,  a  different  random  number  is 
generated.  So  in  each  simulation,  guass(ogiobai,l)  is  called  only  once  and  assigned  to  pgi0bai  t° 
reflect  the  global  variation  for  one  parameter  category.  However,  guassf cr/0C(7/,  1)  is  called  for  each 
parameter  to  reflect  the  local  variation.  So  the  circuit  parameter  values  are  randomly  generated  in  a 
simulation  to  mimic  a  real  process  run.  Over  a  large  number  of  simulation  runs,  we  can  evaluate 
the  circuit  behavior  statistically. 

Listed  in  Table  3-1  is  the  process  variations  of  HYPRES  1  kA/cm~  niobium  process  used  in 
our  calculations.  The  numbers  are  summarized  from  measurements  of  a  large  number  of  samples. 
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Since  HYPRES  guarantees  the  critical  current  density  within  15%  deviation  and  sheet  resistance 
value  within  20%  deviation,  we  constrain  abs(pgiobai  Ic-1)  within  15%,  and  abs(pgiobai  R-l)  within 
20%  during  the  random  parameter  generation. 


TABLE  3-1  Process  variations  of  HYPRES  1  kA/cm2  niobium  process. 


3o  global  variation 

3o  local  variation 

Resistance 

23% 

2.5% 

Critical  Current 

37% 

11% 

Inductance 

15% 

5% 

9  .  . 

Listed  in  Table  3-2  are  the  process  variations  of  the  UCB  6.5  kA/cm-  niobium  process  used  in 
our  calculations.  The  numbers  are  from  limited  number  of  successful  runs.  They  should  be  treated 
as  reachable  goals  instead  of  statistical  summaries. 


TABLE  3-2  Process  variations  of  UCB  6.5  kA/cm2  niobium  process 


3o  global  variation 

3a  local  variation 

Resistance 

7.5% 

2.8% 

Critical  Current 

10% 

3% 

Inductance 

15% 

5% 

Monte  Carlo  analysis  is  applied  to  predict  the  circuit  yield  in  our  designs.  The  yield  is  defined 
as  the  ratio  of  the  number  of  passing  runs  over  the  total  number  of  runs.  By  the  statistical  nature  of 
the  Monte  Carlo  analysis,  the  yield  has  a  Gaussian  distribution.  The  calculated  yield  Y  is  the  mean 
value.  And  the  variance  of  yield  cr  =  Y(  1  -  Y)/N,  where  N  is  the  total  number  of  runs.  For  a  95% 
confidence  level,  the  confidence  interval  L  =  2a  =  2  ■  J(V(  l-  Y))/N.  I.e.,  the  predicted  yield  lies 
in  the  range  of  Y+L  with  a  95%  probability  [47].  The  total  number  of  runs  is  usually  above  100. 


And  the  circuit  is  normally  optimized  with  a  calculated  yield  above  99%.  With  1 00  runs,  and  a  cal- 
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culated  yield  of  99%,  the  yield  lies  in  the  range  of  97%  -100%  with  a  95%  probability.  Monte 
Carlo  analysis  is  also  used  to  estimate  the  timing  variation  along  the  data  path  due  to  process  vari¬ 
ations  in  the  MUX  design. 

In  WRspice,  the  yield  calculation  can  be  done  easily  using  the  built-in  Monte  Carlo  analysis 
function.  While  for  the  timing  variation,  a  separate  script  is  written  to  run  the  simulations  repeti¬ 
tively  and  extract  the  timing  information. 

3.4.3  Comparison  of  Optimization  CAD  tools 

The  purpose  of  optimization  is  to  build  a  robust  circuit  in  spite  of  the  process  variations.  So  the 
optimization  should  be  a  process  to  improve  the  circuit  yield. 

Several  optimization  CAD  tools  and  the  methods  they  are  based  on  are  compared.  Listed  in 
Table  3-3  are  three  RSFQ  circuit  optimization  tools  and  their  main  features,  advantages  and  disad¬ 
vantages.  The  three  tools  are  WinS,  MALT  and  MJSIM. 


TABLE  3-3  Comparison  of  three  RSFQ  circuit  optimization  CAD  tools:  Wins,  MALT  and 
MJSIM 


CAD  tool 

WinS 

MALT 

MJSIM 

Figure  of 
merit 

•  Critical  margin 

•  Margin  along  criti¬ 
cal  direction 

•  Yield 

Simulation 

engine 

•  WinS 

•  JSPICE3 

•  JSIM 

Advantages 

•  Many  parameters 

•  Process  variations 
considered 

•  Process  variations 
considered 

Disadvantages 

•  Process  variations 
not  considered 

•  8  parameters 

•  Convex  operation 
region  required 

•  Computation  cost¬ 
ing 

WinS  is  a  Windows  program  which  can  do  RSFQ  circuit  simulation,  margin  analysis  and  opti¬ 


mization.  The  figure  of  merit  in  Wins  optimization  is  the  critical  margin.  The  critical  margin  is 
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defined  as  the  smallest  among  all  the  circuit  parameter  margins.  Each  circuit  parameter  margin  is 
found  with  all  other  parameters  kept  at  their  nominal  values.  Wins  tries  to  improve  the  circuit  yield 
through  maximizing  the  critical  margin.  This  is  an  indirect  but  often  effective  way  to  improve  the 
circuit  yield.  The  algorithm  implementation  is  straightforward.  Large  numbers  of  parameters  can 
be  included  in  one  optimization.  However,  the  result  does  not  guarantee  optimal  circuit  yield. 
First,  process  variations  are  not  taken  into  consideration.  Different  circuit  parameters  such  as  junc¬ 
tion  critical  currents  and  inductances  can  have  different  process  variations.  The  global  process 
variation  of  a  parameter  is  also  different  from  the  local  process  variation.  But  in  the  WinS  optimi¬ 
zation,  all  the  parameters  or  parameter  combinations  are  treated  equally.  Second,  WinS  optimizes 
the  critical  margins  along  the  parameter  axes  with  only  one  parameter  varying.  In  reality,  all  the 
parameters  can  deviate  from  their  nominal  values  simultaneously.  The  smallest  margin  in  the  oper¬ 
ation  space  may  not  lie  on  the  direction  of  the  parameter  axes. 

To  address  the  above  two  issues,  MALT  optimizes  the  margin  along  the  critical  direction.  It 
uses  an  inscribed-sphere  algorithm.  A  convex  hull  approximating  the  circuit  operating  region  is 
expanded  and  refined  iteratively.  A  sphere  (the  largest  that  will  fit)  is  inscribed  in  the  hull  and  the 
largest  tangent  plane  is  found.  The  perpendicular  passing  through  the  center  of  this  plane  defines 
the  direction  of  the  next  binary  search.  The  new  boundary  point  is  found  and  the  hull  and  inscribed 
sphere  are  redrawn.  When  the  optimization  is  done,  the  optimum  parameter  values  lie  in  the  center 
of  the  sphere,  the  radius  of  the  sphere  is  a  measure  of  the  allowed  variation.  The  directions  of  the 
radius  vectors  to  the  tangent  planes  are  the  critical  directions  along  which  the  parameter  variations 
are  most  restricted.  The  process  variations  are  taken  into  consideration  when  the  convex  hull  is 
formed.  The  operating  region  is  scaled  along  each  parameter  axis  to  make  the  axis  with  larger  pro¬ 
cess  variation  more  critical.  Theoretically,  this  algorithm  should  achieve  better  circuit  yield  since 
both  multi-dimensional  circuit  operating  range  and  the  process  variations  are  evaluated  during  the 
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optimization.  But  there  are  some  practical  limitations  in  applications.  First,  the  recommended 
number  of  parameters  in  each  optimization  is  no  larger  than  eight.  Even  in  a  simplest  RSFQ  cir¬ 
cuit,  eight  dimensions  are  not  enough.  The  practical  strategy  is  to  include  the  most  critical  parame¬ 
ter  such  as  global  inductance  variation,  global  bias  current  variation  in  all  optimizations.  Other 
parameters  are  separated  into  several  optimizations.  The  iterations  are  gone  through  manually  until 
a  satisfying  result  is  achieved.  Second,  the  operating  region  of  the  optimized  parameters  has  to  be 
a  convex  region.  In  RSFQ  circuits,  the  operating  region  of  the  global  inductance  and  the  global 
junction  critical  current  is  concave.  To  solve  this  problem,  we  use  a  derived  parameter,  the  inverse 
of  the  critical  current,  in  the  optimization  to  change  the  operating  region  to  a  convex  contour.  But 
not  every  case  with  concave  region  can  be  visualized  and  solved  this  way.  So  we  might  get  a  local 
optimal  parameter  set  depending  on  the  initial  values. 

MJSIM  uses  yield  as  its  figure  of  merit  directly.  The  simulation  engine  underneath  is  JSIM, 
another  Josephson  junction  simulator.  This  program  was  still  under  development.  The  main  draw¬ 
back  is  the  computation  cost.  For  each  parameter  set,  hundreds  of  runs  of  simulation  runs  are 
needed  to  evaluate  the  corresponding  circuit  yield. 

In  our  design  work,  both  Wins  and  MALT  are  used  to  help  automate  the  optimization.  But 
margin  analysis  and  yield  calculation  are  performed  in  WRspice  to  check  and  confirm  the  circuit 
performance.  The  pass/fail  criteria  in  Wins  and  MALT  are  restricted. 

3.4.4  Layout  and  Inductance  Extraction 

Layout  is  done  in  either  the  Cadence  Virtuoso  layout  tool  or  with  the  Xic  physical  mode.  The 
basic  flow  is:  floor  planning;  physical  implementation;  reviewing  and  design  rule  check  (DRC). 
DRC  rules  for  the  specific  process  need  to  be  compiled  by  the  designer.  LVS  check  is  not  set  up  in 
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either  tool.  So  whether  the  layout  matches  the  circuit  schematic  relies  on  the  designer’s  labor 
intensive  reviewing.  This  is  where  the  design  flow  can  be  improved. 

3.4.4.1  Junction  Layout 

A  library  of  junctions,  unshunted  or  shunted,  with  two  kinds  of  shunt  resistor  placement  are 
pre-drawn.  During  circuit  layout  implementation,  the  junction  size  is  always  rounded  to  the  closest 

9 

junction  size  in  the  junction  library.  Fig.  3.5  shows  a  junction  layout  example  in  the  6.5  kA/cm 
library.  Ic  =  251  pA,  Rs  =2.36  Q  Notice  the  junction  shape  is  similar  to  an  octagon.  But  the  slope 


Figure  3.5  Junction  library  layout,  (a)  Junction  definition  layer  with  M2  contact  to  CE.  (b) 
and  (c)  Junction  with  shunted  resistor. 
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is  implemented  by  stairs  so  all  lines  are  on  the  resolution  grid.  The  junction  drawn  size  is  larger 
than  the  target  size  to  compensate  the  0.5  pm  width  bias  due  to  over  etching  and  anodization. 

Table  3-4  lists  the  junction  sizes  in  our  6.5  kA/cm  process.  The  actual  drawn  size  should  be 
the  listed  value  minus  the  removed  comer  areas  (which  is  too  much  detail  to  be  listed  here).  Ide¬ 
ally,  the  critical  current  value  of  each  junction  should  be  verified  in  testing.  We  use  them  in  the  lay- 
out  before  they  get  verified.  The  critical  current  values  are  same  as  in  the  1  kA/cm-  library  for  the 
convenience  of  design  porting. 


TABLE  3-4  6.5  kA/cm2  junction  layout  library  cell  parameters 


lc  (HA) 

Rs  (Q) 

Drawn  size 
(pm  x  pm) 

Target  area 
(pm2) 

120 

4.93 

2.0  x  2.1 

1.85 

130 

4.55 

2.1  x2.1 

2.00 

140 

4.23 

2.1  x  2.2 

2.15 

151 

3.92 

2.2  x  2.2 

2.32 

163 

3.63 

2.2  x  2.3 

2.51 

174 

3.40 

2.3  x  2.3 

2.68 

186 

3.18 

2.3  x  2.4 

2.86 

198 

2.99 

2.4  x  2.4 

3.05 

211 

2.81 

2.5  x  2.5 

3.25 

224 

2.64 

2.5  x  2.6 

3.45 

238 

2.49 

2.6  x  2.6 

3.66 

251 

2.36 

2.6  x  2.7 

3.86 

264 

2.24 

2.7  x  2.7 

4.06 

279 

2.12 

2.7  x  2.8 

4.29 

294 

2.01 

2.8  x  2.8 

4.52 

309 

1.92 

2.8  x  2.9 

4.75 

325 

1.82 

2.9  x  2.9 

5.00 

339 

1.75 

2.9  x  3.0 

5.22 

356 

1.66 

3.0  x  3.0 

5.48 

373 

1.59 

3.0x3. 1 

5.74 

390 

1.52 

3.1  x  3.2 

6.00 
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3.4.4.2  Inductance  Estimation  and  Extraction 

In  our  layout,  double  ground  layers  are  used  for  all  the  RSFQ  circuit  inductances.  This  is  to 
reduce  the  undesired  parasitic  inductance.  We  used  INDUCT  calculations  to  make  a  convenience 
sheet  for  layout  reference.  And  we  use  LMETER  for  inductance  extraction  after  the  layout  is  done. 
The  concept  of  superconductor  metal  line  inductance  and  INDUCT  can  be  referred  to  Section  3.09 
in  [1],  LMETER  can  be  referred  to  in  the  SUNY  RSFQ  laboratory  web  site  [46].  LMETER  can 
take  layout  database,  and  process  information,  to  calculate  the  superconductor  wire  inductance 
even  with  odd  shapes.  This  is  most  useful  where  a  few  lines  meet  together  at  a  junction.  LMETER 
refers  to  Chang’s  work  [48].  It  shows  close  match  in  the  strip  line  test  case.  For  cases  with  compli¬ 
cated  shapes  where  it  is  most  useful,  it  is  believed  in  the  field  to  have  accuracy  within  ±10  %  .  Pro¬ 
cess  information  such  as  layer  stack-up,  thickness  of  insolation  layers,  superconductor  penetration 
depth,  and  line  width  bias  for  each  metal  layer  are  all  included  in  a  technology  file  as  one  of  the 
input  files  for  LMETER.  For  the  HYPRES  and  UCB  processes,  the  technology  files  need  to  be 
compiled  accordingly. 

3.5  1:8  DEMUX  Design  and  Optimization 

The  main  design  effort  is  focused  on  designing  and  optimizing  the  1 :2  DEMUX  module.  A  1 :4 
DEMUX  and  a  1 :8  DEMUX  can  then  be  easily  built  from  the  optimized  2-bit  module. 

3.5.1  20  GHz  DEMUX  Design,  Layout  and  Optimization 

A  20  GHz  1:2  DEMUX  is  designed  and  optimized  for  the  1  kA/cnr  process.  Fig.  3.6  shows  an 
asynchronous  1 :2  DEMUX,  its  Moore  diagram,  and  the  connection  JTL.  The  circuit  structure  was 
suggested  by  A.  F.  Kirichenko  [49].  But  the  circuit  parameters  are  developed  independently.  Other 
related  references  for  developing  this  circuit  are  [50][51][17].  The  clock  information  is  embedded 
in  the  incoming  data.  Reading  from  the  Moore  diagram,  this  circuit  has  two  internal  states,  state 
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“0”  and  state  “1”.  During  power  up,  the  circuit  is  biased  to  its  quiescent  state,  which  is  state  “0”.  J2 
and  J21  are  biased  close  to  their  Ics.  J4  and  J41  are  biased  away  from  their  Ics.  The  current  flowing 
in  Lstore  from  left  to  right  is  small.  This  is  equivalent  to  a  more  balanced  biasing  between  J2/J21 
and  J4/J41  superimposed  on  the  circulating  currents  in  the  loops  as  marked  in  Fig.  3.6.  With  an 
SFQ  pulse  arriving  at  Input/Input,  the  circuit  is  switched  to  state  “1”,  an  output  pulse  is  generated 
at  Output0/Output0  accordingly.  In  state  “1”,  J2/J2i  are  biased  away  from  their  Ics  and  J4/J4I  are 
biased  close  to  their  Ics,  the  circulating  currents  are  flowing  in  the  direction  opposite  to  the  ones  in 
state  “0”.  The  current  flowing  in  Lstore  from  left  to  right  is  larger.  During  the  state  transition  from 
“0”  to  “1”,  if  the  input  pulse  comes  into  Input,  junctions  J2,  J3  and  J6|  switch  and  the  output  pulse 
is  generated  at  Output0.  If  the  input  pulse  comes  into  Input,  junctions  J21,  J31  and  J6  switch  and  the 
output  pulse  is  generated  at  Output0.  On  the  other  hand,  the  transition  from  state  “1”  to  state  “0”  is 
also  triggered  by  an  SFQ  pulse  at  Input/Input,  an  output  pulse  is  generated  at  Output  | /Output |  cor- 


Figure  3.6  An  asynchronous  1:2  DEMUX  circuit,  (a)  Core  circuit  schematic,  (b)  Moore  dia¬ 
gram.  (c)  Connection  JTL  schematic. 
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respondingly.  During  this  transition,  if  the  input  pulse  is  at  Input,  junctions  Jj,  J4  and  J71  switch 
and  the  output  pulse  goes  to  Output^  If  the  input  pulse  is  at  Input,  junctions  J11?  J41  and  J7  switch 
and  the  output  pulse  goes  to  Output j.  So  this  new  1:2  DEMUX  circuit  behaves  like  a  dual-rail  T 
flip-flop.  The  input  pulses  from  Input/Input  are  diverted  to  Output0/Output0  and  Outputj/Outputj 
alternatively.  The  output  data  rate  is  reduced  to  one  half  of  the  input  data  rate. 

Comparing  the  circuit  schematic  of  the  1:2  DEMUX  with  that  of  the  T  flip-flop  in  Fig.  1.11, 
the  2-bit  DEMUX  is  similar  to  two  T  flip-flops  combined  except  that  junctions  J6,  J61,  J7,  J71  are 
added  to  prevent  the  Input  pulses  from  entering  Output()/Output  |  and  to  prevent  the  Input  pulses 
from  entering  Oiitput()/Output| .  The  resistor  R  in  the  T  flip-flop  is  also  removed  from  the  1:2 
DEMUX  due  to  the  difficulty  to  place  it  in  the  layout.  A  set  of  working  parameters  of  the  T  flip- 
flop  are  referred  as  the  starting  point  to  design  the  2-bit  DEMUX.  The  dynamics  described  in  the 
Moore  diagram  are  referred  to  for  the  parameter  adjustment.  Fig.  3.7a  shows  the  input/output  volt¬ 
age  waveforms  of  a  correct  functioning  of  the  2-bit  DEMUX.  Fig.  3.7b  shows  the  corresponding 
phase  waveforms  of  the  junctions  in  the  JTLs  connected  to  the  inputs/outputs  of  the  2-bit 
DEMUX.  Each  2rc  phase  transition  in  the  junctions  produces  an  SFQ  voltage  pulse  at  the  corre¬ 
sponding  input/output. 

After  the  correct  functioning  is  achieved,  a  pre-layout  optimization  is  done  in  MALT.  Details 
of  the  optimization  procedure  are  explained  below.  The  pass/fail  criterion  is  automatically  gener¬ 
ated  based  on  the  waveforms  of  the  circuit  with  the  initial  parameters.  Input/output  pulse  positions 
are  extracted  as  the  time  points  when  the  junction  phases  are  equal  to  (2k  +  3/2)tc,  k  is  an  integer. 
During  the  optimization,  the  phase  of  each  output  junction  is  checked  at  the  nominal  pulse  posi¬ 
tions  +/-  a  delay  variation.  The  delay  variation  is  set  to  be  20  ps  in  the  optimization  and  can  be  var¬ 
ied  according  to  the  designs.  If  the  difference  between  the  simulated  phase  and  the  expected  phase 
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Figure  3.7  Simulation  waveforms  of  a  correct  function  of  the  2-bit  DEMUX,  (a)  Input/output 
voltages,  (b)  Input/output  JTL  junction  phases. 

is  larger  than  the  fail  threshold,  it  is  considered  a  fail.  The  fail  threshold  of  phase  is  set  to  be  2.0  in 
the  optimization.  The  input  junction  phases  are  checked  at  the  last  check  point.  The  data  sequences 
in  Fig.  3.7  are  used.  Two  stages  of  JTLs  are  connected  to  each  of  the  inputs/outputs  and  are 
included  to  be  optimized.  Due  to  the  symmetry  of  the  circuit,  the  symmetric  parameter  pairs  are  set 
to  vary  together,  such  as  Ji-Jn,  J2-J21  anc*  L0-L2.  The  most  critical  parameters,  the  global  induc¬ 
tance  variation  XL  and  the  inverse  global  critical  current  density  DIcb  are  included  in  all  the  itera- 
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tions.  DIcb  is  set  to  be  static.  Other  parameters,  the  individual  inductances  and  individual  junction 
critical  current  values  are  grouped  and  optimized  in  different  runs.  The  dc  bias  voltage  Vbias  is  also 
allowed  to  vary  in  some  runs.  The  parameter  values  after  the  pre-layout  optimization  and  related 
margins  are  reported  in  the  left  columns  in  Table  3-5.  The  margin  of  XL(-27.0%,  +54.0%)  is  large 


TABLE  3-5  pre-layout  and  post-layout  margin  calculation. 


(a)  Pre-layout 

simulation 

(after  optimization) 

(b)  Post-layout 
simulation  (before  re¬ 
optimization) 

(c)  Post-layout 
simulation 

(after  re-optimization) 

Parameter 

value 

Margin 

value 

Margin 

value 

Margin 

XL 

l 

(-27.0,  +54.0) 

l 

(-19.4, +35.2) 

l 

(-30.6,  +50.8) 

DIcb 

l 

(-21.0, +17.0) 

l 

(-18.1, +53.9) 

l 

(-26.9,  +50.8) 

XIcb 

l 

(-14.5, +26.6) 

l 

(-35.0, +22.1) 

l 

(-33.7, +36.8) 

^bias 

3.264  V 

(-18.8, +20.3) 

2.5V 

(-9.4,  +22.7) 

2.5V 

(-14.4, +11.7) 

Rb(TRb2 

13.61  Q. 

(-42.6, +100*) 

13.6  Q 

(-55.6,  +58.6) 

12.7  Q. 

(-48.1, +100*) 

Rbl 

5.75  n 

(-26.1,29.6) 

5.8  Q 

(-36.9, +18.0) 

5.5  Q. 

(-33.1, +30.5) 

RbJ11 

9.325  fi 

(-25.0, +100*) 

7.61  fi 

(-30.6,  +38.3) 

7.12  Q. 

(-21.9, +77.3) 

Icl'Icll 

279  pA 

(-28.7,39.4) 

279  pA 

(-11.9, +30.5) 

264  pA 

(-20.6, +30.5) 

Ic2‘Ic21 

224  pA 

(-53.6,  40.2) 

224  pA 

(-53.1, +18.0) 

211  pA 

(-50.6, +30.5) 

Ic3'Ic31 

174  pA 

(-51.7,  +51.7) 

174  pA 

(-46.9, +41.1) 

174  pA 

(-56.9, +33.6) 

Ic4‘Ic41 

151  pA 

(-80*, +100*) 

151  pA 

(-71.9, +66.4) 

151  pA 

(-55.6,  +82.0) 

Ic5‘Ic51 

264  pA 

(-80*, +83. 3) 

264  pA 

(-71.9, +32.0) 

251  pA 

(-76.9,  +49.2) 

Ic6‘Ic61 

294  pA 

(-34.0,  +47.6) 

294  pA 

(-55.6, +36.7) 

279  pA 

(-50.6,  +39.8) 

Ic7'Ic71 

294  pA 

(-18.4,  +23.8) 

294  pA 

(-31.9, +27.3) 

294  pA 

(-30.6, +21.1) 

fjtl 

250  pA 

(-21.0,  +44.0) 

251  pA 

(-26.9, +19.5) 

251  pA 

(-15.8, +38.5) 

LrL3 

3.20  pH 

(-80*, +100*) 

4.2  pH 

(-80.0*, +38.3) 

4.3  pH 

(-80*, +72.7) 

L0-L2 

0.89  pH 

(-80*,  +100*) 

1.1  pH 

(-75.6,  +68.0) 

1.1  pH 

(-80*,  +88) 

^store 

2.77  pH 

(-27.9,  +100*) 

3.0  pH 

(-5 1.9, +77.3) 

2.9  pH 

(-66.9, +100*) 

L5-L7 

3.6  pH 

(-80*, +100*) 

3.3pH 

(-43.1, +100*) 

3.4pH 

(-80*, +100*) 

L6-L8 

3.6  pH 

(-80*, +100*) 

3.3pH 

(-80*, +100*) 

3.3pH 

(-80*, +100*) 

Ljtl0-Ljtl2 

1.8  pH 

(-80*, +100*) 

1.45  pH 

(-80*, +100*) 

1.45  pH 

(-80*, +100*) 

Ljtll 

3.6  pH 

(-80*, +100*) 

2.8  pH 

(-80*, +100*) 

2.8  pH 

(-80*, +100*) 

Parasitic  Ls 

N/A 

N/A 

Stated 

separately 

(-80*, +100*) 

Stated 

sepa¬ 

rately 

(-80*, +100*) 

*(-80,  +100)  is  the  maximum  parameter  variation  range  in  the  margin  calculation.  The  actual  circuit  parameter  mar¬ 
gin  may  be  larger. 
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Input  Input 


Figure  3.8  Layout  of  the  2-bit  DEMUX. 

considering  the  3a  global  L  variation  is  15%.  And  that  of  XIcb  (-14.5%,  +26.6%)  is  fair  since  the 
global  Ic  variation  is  guaranteed  to  be  within  15%  by  the  foundry.  The  dc  bias  voltage  margin  is  (- 
18.8%,  +20.3%).  I.e.,  the  operational  dc  bias  voltage  range  is  (2.65  mV,  3.93  mV)  with  the  center 
voltage  at  3.264  mV.  The  critical  parameter  margins  is  the  lower  margin  of  Ic7-Ic71  (-18.4%).  The 
pre-layout  dc  bias  margin  of  a  1:8  DEMUX  based  on  the  above  2-bit  DEMUX  is  (-18%,  +18%). 
Not  being  able  to  handle  more  than  eight  parameters  in  the  same  optimization  setting  made  it  diffi¬ 
cult  to  achieve  good  results  without  carefully  grouping  the  parameters  and  many  iterations.  The 
results  achieved  above  can  be  further  improved. 

Fig.  3.8  shows  the  layout  based  on  the  above  parameters.  To  facilitate  the  cascading,  Input  was 
wrapped  around  to  be  with  Input.  Moats  were  added  near  the  junctions  and  wherever  space 
allowed.  Moats  are  the  area  in  the  layout  with  the  ground  planes  removed  to  avoid  flux  trapping  in 
the  circuits.  Without  paying  special  attention  to  the  fact  that  connection  JTLs  can  affect  the  circuit 
performance,  standard  JTLs  from  the  library  were  used  instead  of  the  ones  as  the  results  of  the 
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Figure  3.9  2-bit  DEMUX  schematic  with  parasitic  inductances. 

optimization.  Bias  resistance  values  were  not  scaled  to  center  the  dc  bias  voltage  range  to  2.5  V  in 
this  layout  but  will  be  corrected  in  the  post-layout  optimization.  Testing  results  based  on  this  lay¬ 
out  implementation  without  further  optimization  will  be  reported  in  Section  5.2.2. 1  and  Section 
5. 2.2. 2. 

Fig.  3.9  shows  the  post-layout  schematics  with  the  parasitic  inductances.  The  updated  parame¬ 
ter  values  and  margins  analyzed  in  MALT  are  listed  in  the  middle  columns  in  Table  3-5.  The  para- 
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sitic  inductance  values  in  Fig.  3.9  are:  Llpl  =  0.04  pH,  Lllpl  =  0.06  pH,  Llp2  =  0.57  pH,  Lllp2  = 
0.57  pH,  L3pl  =  0.05  pH,  L31pl  =  0.06  pH,  L3p2  =  0.62  pH,  L31p2  =  0.63  pH,  Lgp[  =  0.02  pH,  Lgjpl 
=  0.02  pH,  L6p21  =  0.32  pH,  L61p2j  =  0.22  pH,  L6p22  =  0.32  pH,  L61p22  =  0.31  pH,  L7pl  =  0.02  pH, 
Lyipi  =  0.02  pH,  L7p21  =  0.25  pH,  L71p21  =  0.24  pH,  L7p22  =  0.32  pH,  L71p22  =  0.31  pH.  The  mar¬ 
gins  of  the  parasitic  inductances  are  all  very  large,  beyond  (-80%,  +100%).  But  the  parasitic  induc¬ 
tances  change  the  circuit  bias  condition  and  reduce  other  parameter  margins.  The  global 
inductance  XL  margin  reduces  to  (-19.4%,  +35.2%).  The  margins  of  the  global  critical  current 
XIcb  are  changed  to  (-35.0%,  +22.1%).  The  dc  bias  voltage  margins  drop  to  (-9.4%,  +22.7%).  The 
operational  dc  bias  voltage  range  is  (-2.27  mV  to  3.07  mV)  with  the  center  voltage  at  2.5  mV.  The 
critical  parameter  margin  is  that  of  Icl  and  Icll  (-11.9%).  The  pass/fail  criteria  used  in  MALT 
require  that  the  output  pulses  arrive  within  20  ps  from  the  nominal  positions,  which  is  not  a  neces¬ 
sary  requirement  for  asynchronous  circuits  if  the  latency  is  not  in  the  specification. 

With  the  same  pass/fail  criteria  as  the  one  used  by  MALT,  the  dc  bias  margin  calculated  in 
WRspice  is  (-9.3%,  +22.5%)  which  agrees  with  the  MALT  report.  In  WRspice,  more  flexible 
pass/fail  criteria  can  be  scripted.  Two  other  criteria  have  been  tried.  In  one  criterion,  the  sequence 
of  the  output  pulses  are  checked  for  every  pulse,  but  not  at  the  fixed  time  points.  The  pulse  interval 
has  to  be  within  50  ps  +/-  tvar.  Parameter  tvar  is  the  allowed  interval  variation.  We  set  tvar  =  20  ps 
in  our  calculation.  Using  the  other  criterion,  a  fixed  number  of  input  pulses  are  fed  into  the  circuit. 
The  final  junction  phases  are  checked  after  the  last  junction  transition.  With  this  approach,  as  long 
as  the  waiting  period  after  the  last  junction  transition  is  long  enough,  sufficient  latency  variation  is 
allowed  for  the  circuit.  This  criterion  is  less  strict  than  the  previous  one  since  the  details  of  the 
pulse  sequence  and  pulse  interval  are  ignored.  But  since  the  sequence  check  uses  the  measurement 
results  from  the  simulation,  it  takes  3  to  4  times  longer  calculation  time  in  the  margin  and  yield 
calculation.  The  dc  bias  margin  value  with  sequence  check  is  (-8.6%,  34.9%).  The  one  with  final 
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phase  check  is  (-9.3%,  34.9%).  The  two  results  are  close  enough.  In  comparison,  the  MALT  result 
shows  a  big  reduction  on  the  upper  end  dc  bias  margin,  showing  the  effect  of  the  latency  variation. 
The  circuit  yields  calculated  in  WRspice  are  (70%  +/-  3%)  using  the  MALT  criterion.,  (71%  +/- 
3%)  using  the  sequence  check,  (77%  +/-  3%)  using  the  final  phase  check  with  a  95%  confidence 
level.  In  all  three  calculations,  the  same  data  patterns  are  applied.  The  total  number  of  Monte  Carlo 
runs  is  the  same,  798  runs.  Listed  in  Table  3-6  is  a  summary  of  the  dc  bias  margin  and  yield  calcu¬ 
lation  results  using  different  criteria.  Sequence  check  is  a  good  choice  for  the  asynchronous 
DEMUX  circuit  compared  to  the  more  pessimistic  MALT  criterion  and  the  more  optimistic  final 
phase  check  criterion.  The  low  yield  requires  a  post-layout  circuit  re-optimization. 

TABLE  3-6  Post-layout  dc  bias  margin  and  yield  calculation  results  before  circuit  re-optimization, 
using  different  pass/fail  criteria  in  WRspice. 


dc  bias  margin 

Yield  range  w/  95% 
confidence  level 

MALT  criterion  (fixed 
time  point  check) 

(-9.3%,  +22.5%) 

(67%  -  73%) 

Sequence  check 

(-8.6%,  +34.9%) 

(68%  -  74%) 

Final  phase  check 

(-9.3%,  +34.9%) 

(74%  -  80%) 

The  inductance  values  are  kept  unchanged  in  the  post-layout  reoptimization.  The  MALT 
results  are  reported  in  the  right  columns  in  Table  3-5.  The  margin  of  XL  recovers  to  (-30.6%, 
+50.8%).  The  margin  of  Xlcb  recovers  to  (-33.7%,  +36.8%).  Dc  bias  voltage  margin  is  more  cen¬ 
tered  (-14.4%,  +11.7%).  The  critical  parameter  margin  improves  to  -15.8%,  the  lower  margin  of 
lC  ]t|.  The  reason  why  the  parameter  margin  of  Icjt]  is  getting  worse  after  the  reoptimization  is  that 
it  is  not  included  in  the  parameters  to  be  optimized  due  to  the  program  limitation  on  the  total  num¬ 
ber  of  parameters  to  be  optimized.  The  circuit  dc  bias  margin  is  verified  in  WRspice.  Further  yield 
calculation  in  WRspice  proves  that  the  reoptimization  improves  the  circuit  yield.  The  total  number 
of  Monte  Carlo  runs  for  the  yield  calculation  is  798.  Table  3-7  summarizes  the  dc  bias  margin  and 
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yield  results  in  WRspice  after  post-layout  re-optimization  using  different  criteria.  When  the  circuit 
is  optimized,  the  yield  values  using  different  criteria  get  close  enough. 


TABLE  3-7  Post-layout  do  bias  margin  and  yield  calculation  results  after  circuit  re¬ 
optimization,  using  different  pass/fail  criteria  in  WRspice. 


dc  bias  margin 

Yield  range  w/  95% 
confidence  level 

MALT  criterion  (fixed 
time  point  check) 

(-14.5%,  +12.9%) 

(85%  -  89%) 

Sequence  check 

(-14.5%,  +25.2%) 

(87%  -  91%) 

Final  phase  check 

(-14.7%,  +25.2%) 

(89%  -  93%) 

MALT  optimization  did  help  to  improve  the  circuit  yield  to  some  extent.  The  main  limitation 
is  that  a  maximum  of  eight  parameters  can  be  optimized  together.  Optimization  based  on  one 
group  of  parameters  could  hurt  parameter  margins  of  others  which  are  not  included,  and  therefore, 
not  necessarily  improve  the  yield  overall.  Margins  and  yield  verification  in  WRspice  is  necessary 
since  the  yield  reported  by  MALT  only  takes  into  account  variations  of  some  of  the  parameters  and 
the  pass/fail  criteria  in  MALT  is  not  the  most  proper  one. 


Shown  in  Fig.  3.10  is  the  2-bit  DEMUX  dc  bias  margin  for  operation  frequency  above  20 
GHz.  The  dc  bias  margin  of  the  2-bit  DEMUX  varies  little  at  frequency  below  20  GHz.  But  when 
the  frequency  is  beyond  20  GHz,  the  lower  end  dc  bias  margin  starts  to  shrink  and  crosses  zero  at 
around  35  GHz  while  the  upper  end  dc  bias  margin  remains  above  20%  up  to  50  GHz.  So  for  oper¬ 
ation  above  20  GHz,  this  circuit  needs  to  be  re-optimized  for  the  specific  frequency.  And  further¬ 
more,  a  process  with  higher  current  density  may  be  preferred  to  solve  the  speed  limitation. 


The  layout  of  a  1:4  DEMUX  and  a  1:8  DEMUX  are  implemented  based  on  the  above  reopti¬ 
mization  results.  Fig.  3.11  shows  the  micrograph  of  a  1:4  DEMUX.  The  test  results  of  this  layout 
will  be  reported  in  Section  5. 2.2. 3  and  Section  5.2. 2.4.  Fig.  3.12  is  the  micrograph  of  a  1:8 
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Figure  3.10  2-bit  DEMUX  dc  bias  margins  vs.  frequency.  The  data  are  from  post-layout  sim¬ 
ulation  after  reoptimization  including  the  parasitic  inductances.  The  marked  data 
points  are  for  the  frequencies  simulated. 
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Figure  3.11  Micrograph  of  a  1:4  DEMUX. 


DEMUX  with  a  DDST  on-chip  high-speed  test  system.  The  concept  of  the  on-chip  high-speed  test 
system  will  be  discussed  in  Chap.  4.  The  configuration  above  is  actually  used  to  verify  the  1 :4 
DEMUX  by  on-chip  high-speed  testing  and  to  verify  1:8  DEMUX  operation  directly.  To  verify  the 
8-bit  DEMUX  on-chip,  it  requires  an  8-bit  shift  register  and  an  8-bit  clock  generator.  We  only  had 
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Figure  3.12  Micrograph  of  a  1:8  DEMUX  with  DDST  on-chip  high-speed  test  system. 


a  verified  4-bit  shift  register  and  an  4-bit  clock  generator.  This  chip  was  not  able  to  be  demon¬ 
strated  due  to  a  layout  mistake. 


3.5.2  50  GHz  DEMUX  Design,  Layout,  and  Optimization 

A  50  GHz  1:8  DEMUX  is  designed  in  the  6.5  kA/cm  process  based  on  the  20  GHz  design  in 
1  kA/cm  process.  Again  the  optimization  of  the  2-bit  DEMUX  is  the  design  focus.  To  overcome 
the  limitation  of  MALT,  a  different  optimization  tool,  WinS,  is  used  in  the  50  GHz  design.  The  per¬ 
formance  of  the  1:8  DEMUX  based  on  the  optimized  2-bit  module  is  verified  in  WRspice. 

The  performance  of  the  20  GHz  design  gets  boosted  simply  by  replacing  the  1  kA/cm  junc- 
tion  model  with  the  6.5  kA/cnr  junction  model.  Fig.  3.13  shows  the  1:2  DEMUX  simulation 
waveform  at  50  GHz.  A  comparison  of  dc  bias  margins  as  the  function  of  the  operational  fie- 
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Figure  3.13  1:2  DEMUX  simulation  waveforms  at  50  GHz. 

quency  is  illustrated  in  Fig.  3.14.  Parasitic  inductances  are  included  in  the  simulation.  Below  50 
GHz,  the  circuit  dc  bias  margins  in  6.5  kA/cm  are  recovered  to  the  same  level  as  the  ones  at  20 
GHz  in  1  kA/cm  ,  which  are  about  (-12%,  +24%).  Above  50  GHz,  the  dc  bias  margin  starts  to 
shrink.  At  80  GHz,  the  lower-end  dc  bias  margin  is  reduced  to  zero.  So  the  20  GHz  design  is 
already  a  good  starting  point  for  further  optimization.  The  goal  of  the  optimization  is  to  center  the 
dc  bias  margin  and  expand  the  operational  frequency  range  with  good  yield. 

The  20  GHz  design  parameters  are  used  as  the  initial  values  for  the  50  GHz  design  optimiza¬ 
tion.  First,  the  circuit  optimization  is  done  in  WinS  without  any  parasitic  inductances  included. 
The  WinS  reported  dc  bias  margins  are  (-27.4%,  +29.5%),  the  critical  parameter  margin  is  that  of 
Ic7  and  lc7|  (-27.1%)  after  the  optimization.  WRspice  verified  that  the  dc  bias  margins  are  (-25.6%, 
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Figure  3.14  Dc  bias  margin  comparison  of  the  20  GHz  2-bit  DEMUX  design  using  the  1 

kA/cm2  process  (solid  lines)  and  the  6.5  kA/cm2  process  (dashed  lines).  The  latter 
is  not  optimized.  Input  data  pattern  is  the  same  as  that  in  Fig.  3.13. 

Fig.  3.15  shows  the  layout  of  the  1:2  DEMUX  in  the  6.5  kA/cm-  process.  Moats  are  systemat¬ 
ically  added  surrounding  the  superconductor  devices,  junctions,  and  inductors. 

When  the  layout  parasitic  inductances  are  included,  the  circuit  performance  degrades.  The 
WinS  checked  dc  bias  margins  are  (-29.2%,  +17.2%)  and  the  critical  parameter  margin  is  that  of 
Icl  and  Icll  (+13.4%).  In  WinS,  no  parasitic  inductances  can  be  added  to  the  built-in  RSJ  junction 
model.  Only  parasitic  inductances  between  the  junctions  are  included  in  the  WinS  optimization 
and  parameter  margin  evaluation.  WRspice  showed  that  the  dc  bias  margins  are  (-21.7%,  +13%), 
which  include  junction  parasitic  inductances. 

Post-layout  reoptimization  is  done  to  recover  circuit  margins.  The  WinS  reported  that  dc  bias 
margins  are  (-28.8%,  +30.6%)  and  the  critical  parameter  margin  is  that  of  Icl  and  Icll  (+  27.8%). 
WRspice  verified  that  dc  bias  margins  are  (-26.1%,  +29.9%),  the  critical  parameter  margin  is  that 
of  Icl  and  Icll  (+25%)  with  extra  junction  parasitic  inductances.  Since  RSFQ  circuit  components 
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Figure  3.15  1:2  DEMUX  layout  in  the  6.5  kA/cm2  process. 


are  connected  by  inductances  and  interfere  with  the  neighboring  cell’s  dc  bias  current  distribution, 
we  connect  the  DEMUX  core  cell  with  a  few  stages  of  standard  JTLs  during  optimization.  And 
when  this  optimized  cell  is  used  in  the  future,  standard  JTLs  should  be  used  to  connect  this  cell 
with  other  circuits. 


Fig.  3.16  shows  the  50  GHz  1:2  DEMUX  circuit  schematic  with  key  circuit  parameters.  For 
simplicity,  the  junction  parasitic  inductances  are  not  shown  here.  Fig.  3.17  shows  the  WinS  margin 
calculation  results  after  the  post-layout  reoptimization. 

We  further  investigated  the  1 :2  DEMUX  dc  bias  margins  when  the  operation  frequency  is  var¬ 
ied.  Fig.  3.18  shows  the  variation  of  the  dc  bias  margins  of  the  1:2  DEMUX  with  frequency  for 
different  conditions.  The  input  data  pattern  is  the  same  as  that  in  Fig.  3.13  if  not  specially  noted. 
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Figure  3.16  50  GHz  1 :2  DEMUX  schematic  with  parasitic  inductances.  The  key  circuit 
parameters  after  the  re-optimization  are:  lc1  =  lc11  =  264  p A,  lc2  =  lC2i  =  224 
pA,  lc3  =  lc31  =  186  pA,  lc4  =  lc4i  =  264  pA,  lc5  =  lc51  =  264  pA,  lc6  =  lc61  =  264 
pA,  lc7  =  lc71  =  264  pA,  lc8  =  lc81  =  251  pA,  lcg  =  lcgi  =  251  pA;  Li  =  L2  =  0.482 
pH,  l_3  =  l_4  =  2.373  pH,  L5  =  L51  =  4.981  pH,  L6  =  L61  =  2.736  pH,  L8  =  L81  = 
5.183  pH,  Lg  =  L91  =  3.74  pH,  Lstore  =  2.636  pH;  IB1  =  511  pA,  lB2=  lB21  =  213 
pA,  lB8  =  IB81  =  117  pA,  lBg  =  lBgi  =  108  pA. 


9 

Comparing  curve  1  in  Fig.  3.18  with  the  6.5  kA/cnr  margins  in  Fig.  3.14,  we  can  see  that  the  pre¬ 
layout  optimization  improves  the  circuit  dc  bias  margins  dramatically.  Comparing  curve  3  with 
curve  1  and  curve  2  in  Fig.  3.18,  we  can  tell  that  the  post-layout  reoptimization  recovers  the  dc 
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Figure  3.17  WinS  margin  report  of  the  50  GHz  1:2  DEMUX  after  post-layout  re-optimiza¬ 
tion. 


bias  margins  almost  to  the  pre-layout  level  with  slight  loss.  When  the  frequency  is  above  50  GHz, 
the  circuit  lower  dc  bias  margin  is  continuously  decreasing.  It  shrinks  to  zero  at  around  100  GHz. 
So  for  this  circuit  to  operate  at  frequency  above  50  GHz,  it  should  be  re-optimized  for  that  fre¬ 
quency  for  better  circuit  parameter  margins.  This  re-optimized  1 :2  DEMUX  can  operate  up  to  125 
GHz  with  reduced  dc  bias  margin  (16.5%,  29.9%). 

We  also  investigated  the  dc  bias  margin  of  1 :2  DEMUX  when  a  simplified  input  pattern,  all  Is, 
is  fed  to  one  input.  This  corresponds  to  our  test  plan  where  no  DC/SFQ  converter  is  used  to  con¬ 
vert  the  external  pattern  generator  signals.  All  Is  data  pattern  is  generated  at  one  input  by  over 
biasing  the  input  Josephson  junction  above  its  critical  current  value  up  to  very  high  frequency  (a 
few  hundred  gigahertz).  Curve  4  in  Fig.  3.18  shows  the  result  including  parasitic  inductances. 
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Figure  3.18  1:2  DEMUX  dc  bias  margins  vs.  frequency  (a)  in  millivolts  (b)  in  percentage. 


With  the  simplified  input  data  pattern,  the  dc  bias  margin  is  widened  compared  to  the  case  with 
more  complicated  complementary  input  data  pattern.  It  can  operate  up  to  222  GHz  as  simulated  in 


WRspice. 
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When  the  1:8  DEMUX  is  built  from  the  1:2  DEMUX  cells  according  to  the  binary  tree  struc¬ 
tures  as  we  presented  earlier  in  Fig.  3.2,  standard  JTLs  are  used  for  connections.  The  dc  bias  mar¬ 
gins  simulated  in  the  WRspice  are  very  close  to  the  2-bit  DEMUX  result.  It  demonstrates  that  our 
strategy  to  include  standard  JTLs  in  optimization  works. 

3.6  MUX  Simulation  and  Optimization  Result 

3.6.1  20  GHz  Ripple  Logic  MUX  Design,  Layout  and  Optimization 

The  architecture  of  the  MUX  was  discussed  in  Section  3.2.2.  The  building  blocks  include  con¬ 
fluence  buffers,  RS  flip-flops,  D  flip-flops,  and  T  flip-flops.  All  the  basic  cells  were  built  and  veri- 
Tied  in  the  1  kA/cm  HYPRES  process  in  the  previous  projects  by  other  members  of  the  UCB 
cryogroup. 

We  built  a  2:1  MUX  based  on  the  old  cells.  The  block  diagram  of  the  2:1  MUX  is  shown  in 
Fig.  3.19.  It  was  fabricated  in  the  HYPRES  1  kA/cm  process  and  was  shown  to  have  (-7%,  +7%) 
dc  bias  margins  and  to  work  up  to  4  GHz.  The  detailed  testing  results  are  in  Section  5.2.1.  Com¬ 
pared  with  the  block  diagram  in  Fig.  3.3b,  Dffs  are  used  to  latch  the  parallel  input  data  instead  of 
Tffs.  The  advantage  of  using  Dffs  is  that  there  is  no  need  to  take  care  of  the  timing  between  Clock  | 
and  Clock2  within  the  MUX.  But  when  a  2-bit  MUX  is  expanded  to  an  8-bit  MUX,  the  layout  of 


Clk 


Figure  3.19  A  2:1  MUX  block  diagram 
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the  CB  network  for  the  complementary  outputs  of  the  Dffs  becomes  very  difficult  since  the  con¬ 
nection  is  done  by  JTLs  instead  of  metal  wires  in  RSFQ  circuits.  So  we  decided  to  use  RSffs  to 
latch  the  input  data  to  reduce  the  CB  network  complexity  to  half  in  the  further  design.  It  is  also 
advantageous  to  reduce  the  number  of  the  Dffs  used  in  the  circuit  since  this  is  the  cell  with  smallest 
dc  bias  margin  among  all  the  basic  blocks  used  in  the  MUX. 

We  optimized  all  the  basic  blocks  for  better  dc  bias  margin  and  yield.  The  optimizations  are 
mainly  done  in  wither  MALT  or  WinS.  Key  parasitic  inductances  are  included  in  the  simulation 
and  the  optimization.  Fig.  1.11  shows  the  Tff  circuit  diagram  with  the  circuit  parameters.  Fig.  3.20 
shows  the  CB  circuit  diagram  with  circuit  parameters.  Fig.  3.21  shows  the  RSff  circuit  diagram 
with  the  circuit  parameters.  Fig.  3.22  shows  the  Dff  circuit  diagram  with  the  circuit  parameters. 
The  parasitic  inductances  in  the  storage  loop  are  carefully  extracted  and  included  in  the  optimiza¬ 
tion. 

Monte  Carlo  analysis  is  also  used  to  estimate  the  clock/data  path  delay  variations  caused  by 
the  process  variations.  Shown  in  Fig.  3.3b  is  the  block  diagram  of  the  8:1  MUX.  The  Dff  has  a 
setup/hold  time  requirement.  So  the  delay  between  Clockj  and  Clock2  has  to  be  designed  to  com- 


lbl 


Lb2 


Figure  3.20  A  circuit  diagram  of  confluence  buffer  with  optimized  parameters  in  1  kA/cm2  Nb 
process.  Ic1  =  lc2  =  294  uA,  lc3  =  lc4  =  279  uA,  lc5  =  238  liA;  Li  =  L2  =  2.91  pH,  L3 
-  3.67  pH  L,  =  3.6  pH.  Lp,  =  Lp2  =  0.39  pH;  lM  =  407  pA,  lb2  =  123  pA. 
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Figure  3.22  A  circuit  diagram  of  Dff  with  optimized  parameters  in  1  kA/cm2  Nb  process.  Ic1 
=  151  pA,  lc2  =  1 86  pA,  lc3  =  309  pA,  lc4  =  224  pA,  lc5  =  339  pA,  lc6  =  279  pA, 
lc7  =  198  pA,  lc8  =  373  pA;  L-,  =  2.54  pH,  L2  =  0.98  pH,  L3  =  2.54  pH,  L4  =  3.22 
pH,  Ls  =  3.51  pH,  L5  =  3.71  pH,  L6  =  3.71  pH,  Lp1  =  0.29  pH,  Lp2  =  Lp3  =  Lp5  = 
Lp6  =  0.20  pH,  Lp4  =  Lp7  =  0.39  pH,  Lp8  =  0.59  pH;  lb1  =  307  pA,  lb2  =  284  pA. 


pensate  the  long  delay  from  Clock  |  to  the  Data  input  of  the  Dff,  which  is  around  110  ps,  much 
larger  than  one  20  GHz  clock  cycle.  There  are  eight  Clock |  to  Data  Dff  signal  paths  in  a  8:1 
MUX.  One  of  the  eight  clock/data  paths  is  highlighted  in  Fig.  3.3(b)  for  illustration.  It  consists  of 
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Figure  3.21  A  circuit  diagram  of  RSff  with  optimized  parameters  in  1  kA/cm2  Nb  process.  Ic1 
=  224  pA,  lc2  =  325  pA,  lc3  =  325  pA,  lc4  =  294  pA;  Li  =2.14  pH,  L2  =  2.99  pH,  L3 
=  3.60  pH,  Lslore  =  4.13  pH,  Lp  =  0.4  pH;  lb  =  240  nA. 
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three  Tff's,  one  RSff,  and  three  CBs.  Due  to  process  variations,  the  delay  along  the  eight  paths 
could  be  different  from  each  other.  Fig.  3.23  shows  waveforms  in  the  simulation  to  characterize 
the  delay.  Data  Dff  has  eight  consecutive  pulses,  each  goes  through  one  of  the  eight  clock/data 
signal  paths.  In  Monte  Carlo  analysis,  in  each  simulation  run,  each  Tff  of  the  total  seven,  each  RSff 
of  the  total  eight,  and  each  CB  of  the  total  seven  have  different  circuit  parameters,  which  are 
pseudo-randomly  generated  based  on  the  local  process  variations  in  Table  3-1.  The  histogram  of 
the  delay  variations  with  the  Gaussian  fitting  curve  is  plotted  in  Fig.  3.24.  The  total  counts  is  102. 
The  standard  deviation  is  1.38  ps.  So  the  6a  delay  variation  is  8.3  ps.  With  a  50  ps  clock  period  at 
20  GHz,  we  still  have  enough  timing  margin  reserved  for  the  Dff  setup/hold  time  requirement. 

Fig.  3.25  shows  the  waveforms  of  a  correctly  functioning  20  GHz  8:1  MUX.  Clockj  is  at  20 
GHz.  Inputs  D0,  Dj,  D5,  D6,  D 7  are  2.5  GHz  pulses,  D2,  D3,  D4  are  all  0s.  So  Output  is  20  GHz 
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Figure  3.23  Waveforms  of  the  20  GHz  8:1  MUX  data  path  delay  simulation. 


Counts  for  each  bin,  total  =102 
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Figure  3.24  Histogram  of  the  delay  variation  for  one  data  path  in  the  20  GHz  8:1  MUX.  o  = 
1 .38  ps 
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Figure  3.25  Waveforms  of  the  20  GHz  8:1  MUX  simulation.  D2,  D3,  D4  are  all  Os. 
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Figure  3.26  Layout  of  a  20  GHz  8:1  MUX  in  1  kA/cm2  UCB  Nb  process. 


“11000111”  pattern.  The  complementary  Output  is  a  20  GHz  “00111000”  pattern.  The  dc  bias 
margin  of  the  8:1  MUX  is  limited  by  the  Dff  and  is  the  same  as  that  of  the  Dff. 

Fig.  3.26  shows  the  layout  of  a  20  GHz  8:1  MUX  in  1  kA/cm2  UCB  Nb  process.  Clock!  and 
Clock2  are  from  the  same  external  clock  source,  but  with  different  JTL  stages.  The  skew  between 
the  two  clocks  was  chosen  according  to  the  Dff  setup/hold  time  and  previous  calculated  Clock^to- 
Data  Dff  delay.  We  also  made  a  4: 1  MUX  layout,  a  4: 1  MUX  with  on-chip  high-speed  test  system 
and  an  8:1  MUX  with  an  on-chip  high-speed  test  system  layout  for  verifications,  which  will  be  dis¬ 
cussed  in  Section  5.3. 

3.6.2  50  GHz  MUX  Design,  Layout  and  Optimization 

The  basic  cells  using  the  1  kA/cm"  design  parameters  are  verified  in  6.5  kA/cm"  process.  As 
before,  some  connection  parasitic  inductances  are  included  in  the  simulations  already.  The  dc  bias 
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9  .  . 

margins  of  the  cells  in  6.5  kA/cnr  are  listed  in  Table  3-8.  The  dc  bias  margin  of  the  8: 1  MUX  is  (- 
26%,  +28%).  Again  the  large  dc  bias  margins  achieved  are  partly  due  to  not  including  all  the  junc¬ 
tion  parasitic  inductances. 

TABLE  3-8  Dc  bias  margins  of  the  basic  cells  used  in  50  GHz  6.5  kA/cm2  MUX. 


Cell  name 

Dc  bias  margins 

CB 

(-40%,  +46%) 

Tff 

(-28%,  +32%) 

RSff 

(-46%,  +36%) 

Dff 

(-26%,  +28%) 

Monte  Carlo  analysis  is  performed  to  evaluate  the  Clockj-to-Data_Dff  delay  variation.  The 
6.5  kA/cnr  process  variations  in  Table  3-2  are  used.  The  histogram  of  the  delay  variations  and  its 
Gaussian  fitting  curve  are  plotted  in  Fig.  3.27.  The  total  counts  is  138.  The  standard  deviation  is 
0.46  ps.  The  6a  delay  variation  is  2.8  ps,  which  is  still  a  small  portion  of  20  ps  clock  period  at  50 


Figure  3.27  Histogram  of  the  50  GHz  8:1  MUX  data  path  delay  variation  in  the  6.5 
kA/cm2  process. 
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Figure  3.28  50  GHz  8:1  MUX  simulation  waveforms. 


GHz.  The  small  delay  variation  is  due  to  the  assumed  small  process  variations  in  UCB  high-Jc  Nb 
process.  Fig.  3.28  shows  the  50  GHz  waveforms  of  the  8:1  MUX. 

The  Tff,  CB,  Dff  are  then  laid  out  and  post-layout  optimizations  are  done.  Since  in  WinS,  the 
junction  model  has  to  be  an  RSJ  model  without  parasitic  inductances,  further  circuit  performance 
enhancement  was  done  by  manually  adjusting  the  circuit  parameters. 

Fig.  3.29  shows  the  layout  of  the  Tff  in  6.5  kA/cm  process  and  its  corresponding  block  dia¬ 
gram.  Systematic  moats  are  applied  in  the  circuit  layout.  Ic3  is  changed  to  325  pA  from  356  pA  for 
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Figure  3.29  The  6.5  kA/cm2  Tff  layout  and  its  corresponding  block  diagram. 

better  parameter  margins.  This  block  is  put  on  the  first  6.5  kA/cirr  test  chip  to  be  verified.  The  ver¬ 
ification  of  this  cell  was  designed  to  be  very  simple,  without  DC/SFQ  and  SFQ/DC  cells.  The 
input  SFQ  pulses  are  generated  by  over-biasing  the  input  junction  Jinput.  Ic  jnput  =  25 1  pA.  When 
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Figure  3.30  Simulation  waveforms  of  the  6.5  kA/cm2  Tff. 

Ib  input  =  323  pA  in  simulation,  the  input  pulse  frequency  is  about  50  GHz.  Ic  Outputl  =  *c  0utput2  = 
251  pA,  and  they  are  biased  at  175  pA.  The  voltage  waveforms  in  Fig.  3.30  shows  that  the  output 
pulse  frequency  is  half  of  the  input  frequency.  With  such  simple  arrangement,  this  Tff  has  dc  bias 
margins  of  (-30%,  +38%)  and  can  work  up  to  220  GHz. 

Shown  in  Fig.  3.31  is  the  layout  of  the  Dff  in  6.5  kA/cm-  process.  Post-layout  simulation 
shows  substantial  margin  loss  if  all  the  junction  parasitic  inductances  are  included  in  the  simula¬ 
tions.  The  manual  re-optimization  could  only  recover  the  circuit  dc  bias  margins  to  (-21.7%, 
+15.7%).  The  new  circuit  parameters  are  implemented  in  this  layout  and  put  on  the  first  6.5 
kA/cm  test  chip.  The  circuit  parameters  are  recorded  in  Section  4.3.3,  since  the  50  GHz  high¬ 
speed  test  system  also  used  this  Dff  too. 


Chapter  3:  Design  and  Optimization  of  a  Demultiplexer  and  a  Multiplexer 


98 


Clock 


Data 


Figure  3.31  Layout  of  the  6.5  kA/cm2  Dff. 
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Post-layout  optimization  was  also  done  for  the  CB,  which  is  also  discussed  in  detail  in  Section 
4.3.1  as  part  of  the  high-speed  test  system  design.  The  achieved  post-layout  dc  bias  margins  are  (- 
28.7%,  +29.6%).  The  post-layout  dc  bias  margins  of  the  re-optimized  cells  are  listed  in  Table  3-9. 

TABLE  3-9  Post-layout  dc  bias  margins  of  the  basic  cells  to  be  used  in  50  GHz  6.5  kA/cm2  MUX. 


Cell  name 

Dc  bias  margins 

CB 

(-28.7%,  +29.6%) 

Tff  (w/  all  Is  as  Input) 

(-30%,  +38%) 

Dff 

(-21.7%, +15.7%) 
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CHAPTER  4 

50  GHz  On-Chip  Testing  System 


4.1  introduction 

Direct  high-speed  testing  of  RSFQ  circuits  is  expensive,  and  it  is  limited  by  the  signal  loss 
along  the  cables  to  around  20  GHz  with  the  current  commercially  available  testing  equipment.  The 
difficulty  arises  from  very  high  circuit  operation  speed  and  small  amplitude  of  signals.  SFQ/DC 
converters  are  placed  at  the  RSFQ  circuit  outputs  to  convert  SFQ  pulses  to  voltage  waveforms.  So 
the  signals  coming  out  of  SFQ/DC  converters  are  a  few  hundred  microvolts.  Without  the  SFQ/DC 
conversion,  the  picosecond  SFQ  pulses  would  be  even  less  likely  to  survive  the  dispersion  and  loss 
along  the  cables.  RSFQ  circuits  can  operate  at  a  few  tens  of  gigahertz,  with  potential  to  go  up  to 
above  100  GHz.  For  RSFQ  circuit  function  verification  at  speeds  above  20  GHz,  an  on-chip  high¬ 
speed  testing  system  is  necessary  [52], 

The  idea  of  on-chip  high-speed  testing  is  that  input  data  are  loaded  to  input  shift  registers  at 
low  speed  and  stored  there  until  an  on-chip  high-speed  clock  is  turned  on  to  push  these  data 
through  the  circuit  under  test  (CUT).  After  the  high-speed  operations  of  the  CUT  are  finished,  the 
on-chip  high-speed  clock  is  turned  off.  The  results  of  the  circuit’s  high-speed  operation  are  stored 
in  output  shift-registers  and  can  be  read  out  at  low  speed  later  on  to  verify  the  circuit  operation. 


Chapter  4:  50  GHz  On-Chip  Testing  System 


100 


Trigger 


Figure  4.1  Block  diagram  of  a  DDST  on-chip  high-speed  testing  system.  High-speed  opera¬ 
tions  of  the  circuit  under  test  are  controlled  by  the  on-chip  high-speed  clock  pulses 
and  recorded  by  the  output  shift  registers.  Input  and  output  data  are  fed  into  and 
read  out  by  low-speed  instruments. 


Various  configurations  have  been  developed  [53] [54] .  Shown  in  Fig.  4.1  is  a  block  diagram  of  the 
Data-Driven  Self-Timed  (DDST)  on-chip  high-speed  testing  system  [39] [55] .  Unlike  other 
designs,  an  on-chip  pulse  generator  is  used  to  produce  a  fixed  number  of  high-speed  clock  pulses 
initialized  by  a  trigger  signal.  Such  a  pulse  generator  avoids  the  difficulty  of  accurate  timing  con¬ 
trol  in  gating  a  continuous  clock  generator.  DDST  shift  registers  are  based  on  the  application  of 
dual-rail  data.  Timing  information  is  embedded  in  the  data.  Therefore,  no  external  low-speed  clock 
is  required  to  load  and  read  out  data  so  the  effort  on  timing  control  between  a  high-speed  clock  and 
a  low-speed  clock  is  saved.  Previously,  20  GFlz  operations  of  such  a  testing  system  in  the  1 
kA/cm  niobium  process  were  demonstrated  successfully  [56]  [57].  In  this  chapter  the  design  and 
optimization  of  such  a  test  system  for  50  GHz  operation  in  the  6.5  kA/cm  niobium  process  will  be 
described.  A  pulse  generator  is  designed  and  optimized  to  produce  SFQ  clock  pulses  at  a  ffe- 
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quency  between  11.4  GHz  and  88.2  GHz.  The  DDST  shift  register  is  modified  from  the  20  GHz 
design  parameters  and  optimized  to  recover  the  dc  bias  margins  from  ±5  %  to  (-18.3%,  15.7%)  at 
50  GHz.  The  whole  testing  system’s  dc  bias  margins  recover  from  zero  to  (-25.2%,  15.7%)  upon 
reoptimization. 

4.2  50  GHz  Pulse  Generator 

As  discussed  above,  high-speed  operations  of  the  CUT  are  governed  by  an  on-chip  high-speed 
clock.  The  clock  pulse  generator  to  be  introduced  has  the  merits  of  simple  configuration  and  con¬ 
trollable  start  and  stop.  Shown  in  Fig.  4.2a  is  a  block  diagram  of  a  4-bit  ladder  pulse  generator. 
Each  stage  consists  of  an  SFQ  pulse  splitter  (PS),  a  confluence  buffer  (CB),  and  JTLs  inserted 
along  the  signal  paths  represented  by  the  arrows.  The  PS  is  a  fork  and  the  CB  is  a  merger  for  sig¬ 
nals.  The  first  clock  pulse  is  generated  after  the  trigger  pulse  travels  through  the  first  PS,  the  first 
rung  of  the  ladder  and  the  first  CB.  The  second  clock  pulse  comes  out  through  the  first  two  PSs, 


Figure  4.2  A  4-bit  ladder  pulse  generator,  (a)  block  diagram,  (b)  WRspice  simulation  result. 
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Figure  4.3  The  circuit  schematic  of  one  stage  PS-CB  combination  in  the  50  GHz  pulse 
generator.  The  optimized  device  parameter  values  are  shown  as  below.  Junc¬ 
tion  critical  current  values  are:  lc1  =  262.5  pA,  lc2  =  320  pA,  lc3  =  250  pA,  lc5  = 
Ic-iq  —  312.6  pA,  lc8  —  lcg  —  269.1  pA,  I c7  —  250  pA,  lcg  —  250  pA,  lc^  —  250  pA. 
Inductance  values  are:  L1  =  4.0  pH,  L2  =  1 .848  pH,  L3  =  1 .391  pH,  L4  =  4.6  pH, 
L7  =  4.232  pH,  L8  =  L13  =  0.7  pH,  L9  =  4.0  pH,  L10  =  1 .2  pH,  L-,-,  =  2.8  pH,  L12  = 
3.0668  pH.  Bias  current  values:  IB4  =  439.6  pA,  IB2  =  1 92.8  pA,  IB3  =  253.4  pA, 
IB4  =  507.7  pA,  IB5=  192.8  pA. 


the  second  rung  of  the  ladder  and  the  first  two  CBs.  The  total  number  of  clock  pulses  generated 
from  a  single  trigger  pulse  is  controlled  by  the  number  of  stages  in  the  pulse  generator.  The  pulse 
interval  is  roughly  the  delay  of  one  stage  which  can  be  adjusted  by  the  number  of  JTLs  inserted, 
and  also  depends  on  the  dc  bias.  In  the  last  stage,  the  unconnected  PS  output  and  CB  input  are  each 
terminated  by  a  3.6  pH  inductor  and  a  1  Q  resistor  to  ground.  Fig.  4.2b  shows  a  simulation  result  of 
a  50  GHz  4-bit  pulse  generator. 

Fig.  4.3  shows  the  circuit  schematic  of  one  stage  PS-CB  combination  in  the  pulse  generator. 
The  junctions  shown  in  the  schematic  are  resistor-shunted  junctions  (RSJs).  They  are  made  with 
ICR  =  0.592  mV,  |3C  =  1.  The  parameter  values  listed  are  the  result  of  WinS  optimization.  The  initial 
parameter  values  put  into  the  optimization  are  obtained  from  modifying  the  earlier  20  GHz  pulse 
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bl  Collection  1 . 90.0  |  76.6 

b2  Collection  1 . 82.6  |  68.9 

b3  Collection  1 . 77.0  |  86.1 

b5,b10  Collection  1 . 48.2  |  47.1 

b6,b9  Collection  1 . 36.2  |  45.0 

b7  Collection  1 . 90.0  |  87.9 

b8  Collection  1 . 90.0  |  90.0 

bl  1  Collection  1 . 90.0  |  83.7 

IB1  Collection  1 . 63.3  1  59.4 

IB2  Collection  1 . 90.0  j  83.0 

IB3  Collection  1 . 90.0  |  86.8 

IB4  Collection  1 . 55.2  |  44.6 

IB5  Collection  1 . 90.0  j  79.5 

LI  Collection  1 . 90.0  |  90.0 

L2  Collection  1 . 90.0  |  90.0 

L3  Collection  1 . 90.0  j  90.0 

L4  Collection  1 . 90.0  |  90.0 

L7  Collection  1 . 90.0  |  90.0 

L8.L13  Collection  1 . 90.0  |  90.0 

L9  Collection  1 . 90.0  |  90.0 

L10  Collection  1 . 90.0  |  90.0 

L1 1  Collection  1 . 90.0  |  90.0 

LI  2  Collection  1 . 90.0  |  90.0 

L14  Collection  1 . 90.0  |  90.0 

jcl  RSJ  Junction  0.25m . 90.0  |  90.0 

jc2  RSJ  Junction  0.25m . 90.0  |  84.7 

I  Bel  dc  Current  0.1781m . 90.0  |  90.0 

IBc2  dc  Current  0.1781m . 90.0  j  90.0 

jll  RSJ  Junction  0.25m . 52.0  |  90.0 

jl2  RSJ  Junction  0.25m . 54.5  |  90.0 

IBI1  dc  Current  0.1781m . 90.0  |  68.9 

IBI2  dc  Current  0.1781m . 90.0  |  71.4 

lnputjtl_bias  Collection  1.  ..90.0  |  72.4 
Outputjtl  bias  Collection  1 .  .90.0  |  72.4 

se  bias  Collection  1 . 42.2  |  36.9 

scJtl_load_bias  Collection  1. 41.5  |  36.2 


Figure  4.4  WinS  margin  report  on  the  pulse  generator  with  parameters  shown  in  Fig.  4.3. 


9  •  •  9 

generator.  The  6.5  kA/cm- junction  model  replaces  the  1  kA/cnT  model,  and  some  JTLs  are  taken 


out  of  the  original  circuit  to  shorten  the  clock  period  to  about  20  ps  corresponding  to  50  GHz.  Par¬ 


asitic  inductances  are  not  yet  included  in  the  optimization. 


In  WinS,  optimization  is  set  up  to  maximize  the  critical  margin  among  the  junction  critical  cur¬ 


rent  values,  inductance  values,  individual  bias  current  values  and  the  global  bias  current  value. 


Seen  from  the  WinS  report  in  Fig.  4.4,  the  critical  parameter  margins  are  those  of  b6,b9  collection 
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Figure  4.5  Post-layout  circuit  schematic  of  one  stage  PS-CB  combination  in  the  50  GHz 
pulse  generator.  The  device  parameter  values  are  shown  as  below.  Junction 
critical  current  values:  lc1  =  264  pA,  lc2  =  325  pA,  lc3  =  251  pA,  lc5  =  lc10  =  309 
pA,  lc6  =  lcg  =  264  pA,  lc7  =  lc8  =  lc-|-|  =  251  pA.  Shunt  resistor  values:  Rs1  =  2.24 
G,  Rs2  =  1 .82  G,  Rs3  2.36  G,  RS5  =  Rsiq  =  1  -92  G,  Rs8  =  Rsg  =  2.24  G,  Rs7  — 
Rs8  =  Rs-|i  =  2.36  G.  Parasitic  inductance  values:  Lps6  =  Lpsg  =  0.5  pH,  Lpr6  = 
Lprg  =  1  pH,  all  other  Lps  =  0.1  pH,  Lpr7  =  0.7  pH.  Inductance  values:  Li  =  4.0 
pH,  L2  =  1 .85  pH,  L3  =  1 .39  pH,  L4  =  4.6  pH,  L7  =  4.23  pH,  L8  =  L13  =  1  pH,  Lg  = 
4.0  pH,  Liq  =  1.2  pH,  Ln  =  2.8  pH,  L12  =  3.07  pH.  Bias  resistor  values:  RB-|  = 
13.1  G,  RB2  =  29.8  G,  RB3  =  22.7  G,  RB4  =  11.3  G,  RB5  =  29.8  G. 


(-36.2%)  and  that  of  the  global  bias  collection  (+36.2%).  The  margin  result  is  confirmed  by  the 
WRspice  simulation. 


Fig.  4.5  shows  the  post-layout  circuit  schematics  of  the  one  stage  PS-CB  combination.  The 
bias  current  sources  are  implemented  by  bias  resistors  connected  to  a  common  bias  voltage  source 
Vbias-  F°r  connection  convenience  in  layout,  the  order  of  L8  and  b6,  L13  and  h9  are  switched  com¬ 
pared  to  the  pre-layout  schematics.  Junction  critical  current  values  are  rounded  to  the  closest  val¬ 
ues  available  from  our  shunted  junction  library.  Inductance  extraction  is  done  using  the  program 
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Figure  4.6  Pulse  frequency  vs.  dc  bias  voltage,  Vbias  in  Fig.  4.5. 

LMETER.  The  updated  device  parameter  values  including  parasitic  inductances  are  listed  with  the 
post-layout  schematics.  Post-layout  simulation  in  WRspice  shows  that  the  circuit  performance 
with  the  parasitic  inductance  is  sufficient.  The  critical  parameter  margin  is  that  of  b6  (+36%).  The 
dc  bias  margin  is  (-42.4%,  36%).  Or  equivalently,  the  viable  dc  bias  voltage  range  is  (3.5  mV  to 
7.55  mV)  with  the  nominal  value  at  5.75  mV.  No  further  design  modification  is  needed.  Fig.  4.6 
shows  the  frequency-bias  voltage  relationship  from  the  post-optimization.  The  4-bit  pulse  genera¬ 
tor  produces  pulses  in  the  frequency  range  (11.4  GElz  to  88.2  GElz)  by  varying  its  dc  bias  voltage 
in  the  range  (3.5  mV  to  7.55  mV).  The  center  frequency  is  52.2  GElz  at  the  nominal  bias  voltage 
5.75  mV. 

Further  simulation  shows  that  longer  pulse  generators  can  be  built  without  sacrificing  margins. 
Fig.  4.7  shows  a  micrograph  of  a  16-bit  pulse  generator  put  on  the  test  chip  for  verification.  A  T 
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DC/SFQ 

SFQ/DC  combo  Separate  dc  bias  for  the  pulse  generator 


SFQ/DC  Tff  16-bit  pulse  generator 


Figure  4.7  Micrograph  of  a  1 6-bit  pulse  generator  with  peripheral  circuits  on  test  chip. 


flip-flop  is  connected  to  the  output  of  the  pulse  generator  to  reduce  the  output  frequency  to  one 
half.  There  is  an  additional  built-in  T  flip-flop  in  the  SFQ/DC  converter  following  the  Tff.  So,  with 
a  spectrum  analyzer  with  an  upper  frequency  limit  of  20  GFlz,  the  pulse  generator  can  be  verified 
up  to  80  GFlz.  As  marked  in  the  micrograph,  the  pulse  generator’s  dc  bias  voltage  can  be  adjusted 
independently.  So  its  dc  bias  full  operating  range  and  corresponding  clock  frequency  can  be  tested 
without  being  limited  by  the  peripheral  circuits’  dc  bias  margins. 


4.3  Data-Driven  Self-Timed  (DDST)  Shift  Registers 

The  DDST  shift  registers  are  used  to  store  the  input  data  used  by  the  CUT  in  the  high-speed 
operations  and  to  record  the  high-speed  operation  result  which  we  can  read  off-chip  at  low-speed. 
Fig.  4.8  shows  the  block  diagram  of  a  4-bit  DDST  shift  register.  It  consists  of  a  front  stage  to 
recover  timing  information,  three  stages  of  single-rail  shift  registers  (SR)  and  a  D  flip-flop  at  the 
end  to  regenerate  dual-rail  outputs.  The  SR  and  D  flip-flop  are  clocked  gates.  The  front  stage  com¬ 
bines  the  dual-rail  input  data  to  generate  a  local  clock  for  the  SR  and  the  D  flip-flop.  Meanwhile, 
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Figure  4.8  Block  diagram  of  a  4-bit  DDST  shift  register.  Solid  dots  are  pulse  splitters  (PS). 


the  positive  input  data  propagate  to  the  first  SR.  In  each  clock  cycle,  the  data  are  shifted  right  one 
bit.  The  last  stage  is  a  D  flip-flop  instead  of  a  single-rail  SR,  where  the  dual-rail  outputs  are  recov¬ 
ered.  With  the  data-driven  self-timing  strategy,  the  difficulty  of  generating  and  distributing  a  very 
high-speed  global  clock  is  avoided.  But  within  the  DDST  system  careful  timing  is  still  very  impor¬ 
tant  for  the  circuit  to  achieve  good  dc  bias  margin  at  50  GHz.  We  will  introduce  each  building 
block  and  its  timing  concern  in  the  following  sections.  Since  the  D  flip-Flop  and  the  SR  both  are 
synchronous  circuits,  the  data  signal  has  to  arrive  a  tsetup  before  the  clock  signal  and  a  thold  after 
the  clock  signal  as  illustrated  in  Fig.  1.16(b).  The  required  setup  and  hold  time  of  the  D  flip-flop 
and  the  SR  are  carefully  characterized  within  the  entire  dc  bias  range.  The  previous  stage  clock-to- 
data  delay  is  calculated  to  compare  with  the  setup/hold  time  requirement  to  make  sure  enough  tim¬ 
ing  margins  are  guaranteed.  The  simulation  results  on  a  4-bit  shift  register  and  a  two-stage  cas¬ 
caded  4-bit  shift  register  will  also  be  reported  at  the  end.  One  limitation  of  the  DDST  shift  register 


is  that  it  requires  dual-rail  data. 
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(a) 


(b) 


■DATA 


Figure  4.9  Block  diagrams  of  the  front  stage  in  the  DDST  shift  register,  (a)  Current  imple¬ 
mentation.  (b)  Possible  improvement. 


4.3.1  Front  Stage 

Shown  in  Fig.  4.9a  is  the  circuit  block  diagram  of  the  currently  implemented  front  stage  in  the 
DDST  shift  register.  The  complementary  inputs  In  and  In  are  combined  by  a  confluence  buffer 
(CB)  to  generate  the  local  clock  signal  CLK.  One  extra  JTL  stage  is  inserted  between  In  and  CB  to 
match  the  delay  of  the  PS.  Three-stage  JTLs  are  used  before  DATA  to  achieve  proper  timing 
between  CLK  and  DATA.  Fig.  4.10  shows  the  post-layout  circuit  schematics  of  the  components  in 
the  front  stage.  The  inductance  values  are  extracted  from  the  layout.  Parasitic  inductance  values 
are  also  included.  The  dc  bias  current  values  in  parentheses  is  at  Vbias  =  5.75  mV.  The  CB  is  the 
critical  block  in  the  front  stage,  and  it  has  dc  bias  margins  of  (4.25  mV,  7.65  mV),  (-26.1%, 
33.0%).  The  dc  bias  margins  of  the  front  stage  from  the  post-layout  simulation  are  (4.6  mV,  7.6 
mV),  (-20%,  32.2%).  The  lower-end  dc  bias  margin  of  the  front  stage  is  worse  than  that  of  the  CB. 
One  possible  reason  is  that  the  delay  difference  between  the  data  In  path  and  In  path  gets  larger  at 
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Figure  4.10  Post-layout  circuit  schematics  of  the  components  in  the  front  stage, 
(a)  JTL  (b)  PS  (c)  CB. 


lower  dc  bias  voltage,  which  causes  CB  to  fail  at  4.6  mV  instead  of  4.25  mV.  The  delay  from  CLK 


to  DATA  is  a  function  of  the  dc  bias  voltage.  Table  4-1  shows  the  CLK  to  DATA  delay  from  the 


post-layout  simulation. 
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TABLE  4-1  CLK  to  DATA  delay  of  the  front  stage  as  a  function  of  the  do  bias  voltage. 


CLK  to  DATA  delay  (ps) 

dc  bias  voltage  (mV) 

4.5 

7.6 

4.1 

5.75 

1.2 

4.6 

Shown  in  Fig.  4.9b  is  a  block  diagram  of  the  input  stage  with  some  timing  improvement. 
Instead  of  using  one  stage  of  JTL  to  match  the  PS  delay,  the  same  PS  is  inserted  in  the  In  path  for 
perfect  delay  matching.  This  approach  can  help  increase  the  lower-end  dc  bias  margin  of  the  CB  at 
50  GHz,  which  is  the  bottleneck  of  the  whole  front  stage.  A  CB  is  inserted  in  the  DATA  output 
path  to  match  the  CB  delay  in  the  CLK  path.  As  a  result,  when  dc  bias  voltage  is  decreased,  the 
delay  from  CLK  to  DATA  is  increased,  which  is  the  timing  condition  preferred  by  the  next  stage. 
One  JTL  is  inserted  between  PS  and  CB  to  improve  slightly  the  circuit  dc  bias  margins.  Besides 
the  timing  adjustment,  the  dc  bias  level  of  CB  is  scaled  to  center  its  dc  bias  margins.  The  two  bias 
resistors  are  changed  from  14.13  Q  and  46.75  Q  as  in  Fig.  4.10  to  13.66  Q  and  45.19  Q  The  new 
dc  bias  margins  of  the  CB  are  (4.1  mV,  7.45  mV),  (-28.7%,  29.6%)  at  50  GHz,  exactly  the  same  as 
that  of  new  improved  whole  front  stage  at  50  GHz.  So  we  know  the  timing  matching  here  helped 
to  increase  the  circuit  dc  bias  margin.  The  new  delay  from  CLK  to  DATA  from  post-layout  simula¬ 
tion  is  listed  in  Table  4-2. 

TABLE  4-2  CLK  to  DATA  delay  of  the  improved  front  stage  as  a  function  of  the  dc  bias  voltage. 


CLK  to  DATA  delay  (ps) 

dc  bias  voltage  (mV) 

5.2 

7.45 

7.1 

5.75 

10.5 

4.1 
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Figure  4.11  Post-layout  circuit  schematics  of  one  stage  SR. 


The  timing  improvement  is  at  the  cost  of  more  devices,  area  and  power.  As  we  will  see  later, 
the  bottleneck  of  the  whole  DDST  shift  register  is  not  the  front  stage,  even  without  the  timing 
improvement.  So  we  did  not  implement  the  timing- improved  version. 

4.3.2  SR  Stage 

Fig.  4.11  shows  one  stage  of  the  single-rail  shift  register  (SR).  The  core  of  the  SR  is  an  RS 
flip-flop  with  the  detailed  post-layout  parameters  marked.  The  JTL  and  SP  have  the  same  circuit 
parameters  as  in  Fig.  4.10.  Between  the  clock  pulses,  incoming  data  set  the  state  of  the  RS  flip- 
flop.  With  the  arrival  of  clock  pulses,  the  RS  flip-flop  resets  its  state  and  generates  output  pulses 
accordingly.  The  JTLs  are  inserted  to  adjust  timing.  The  PS  is  for  clock  propagation.  The  timing  of 
the  SR  is  designed  to  have  one  clock  cycle  latency. 
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Fig.  4.12  shows  the  two-dimensional  operation  range  simulation  result  of  one  stage  SR  at  50 
GFlz.  The  horizontal  axis  is  the  dc  bias  voltage.  The  vertical  axis  is  the  delay  from  clock  in  to 
data  in.  At  the  nominal  dc  bias  voltage  5.75  mV,  the  viable  delay  range  is  (-4.5  ps  to  14  ps).  For 
larger  dc  bias  voltage  up  to  7.45  mV,  the  viable  delay  range  is  kept  almost  the  same.  But  when  the 
dc  bias  voltage  is  below  4.5  mV,  the  viable  delay  range  starts  to  shrink.  At  4.2  mV,  the  viable  delay 
range  is  (0  ps  to  17  ps).  The  minimum  operable  dc  bias  voltage  is  3.9  mV,  where  the  viable  delay 
range  is  (4.5  ps  to  12.5  ps).  So  we  know  the  maximum  achievable  dc  bias  margins  are  (3.9  mV,  7.4 
mV),  (-32.2%,  28.7%)  if  we  control  the  input  delay  within  (4.5  ps  to  12.5  ps).  For  delay  less  than 
4.5  ps,  the  dc  bias  margin  starts  to  shrink.  When  the  delay  is  0  ps,  the  dc  bias  margins  shrinks  to 
(4.2  mV,  7.4  mV),  (-27.0%,  28.7%). 
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Figure  4.12  Two-dimensional  operation  range  of  a  one-stage  SR  at  50  GFlz. 
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Figure  4.13  Timing  at  the  input  of  the  first  SR  in  the  DDST  shift  register  at  50  GHz. 


In  Fig.  4.13,  the  output  clock-to-data  delays  of  the  front  stage  in  Table  4-1  and  Table  4-2  are 
plotted  and  compared  with  the  timing  requirement  at  the  input  of  the  first  SR.  We  can  see  that  both 
the  current  design  and  timing-improved  front  stage  satisfy  the  SR  timing  requirement  within  their 
own  operable  dc  bias  voltage  range.  However,  the  timing-improved  version  can  extend  its  dc  bias 
margin  down  to  3.9  mV,  while  the  current  version  works  only  down  to  4.6  mV.  On  the  other  hand, 
the  smaller  delay  of  the  current  version  is  actually  preferred  when  we  are  trying  to  push  the  circuit 
to  operate  at  speeds  higher  than  50  GHz.  As  long  as  4.6  mV  is  not  the  bottleneck  of  the  whole 
block,  the  current  version  has  a  satisfactory  timing  design. 

The  timing  when  two  SRs  are  cascaded  is  also  checked.  Table  4-3  lists  the  Clock  Out  to 
Data  Out  delay  of  one  stage  SR  when  its  setup/hold  time  is  well  satisfied.  The  delay  with  one 
extra  JTL  inserted  at  the  output  is  also  listed  for  discussion.  In  Fig.  4.14,  the  delay  from  Table  4-3 
is  plotted  in  comparison  with  the  timing  requirement  at  the  input  of  the  SR.  The  current  implemen- 
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TABLE  4-3  Clock_Out  to  Data_Out  delay  vs.  dc  bias  voltage  of  one  stage  of  SR. 


Clock_Out  to  DataOut  delay  (ps) 

dc  bias  voltage 
(mV) 

current 

implementation 

w/ 1  extra  JTL  at  the 
output 

1.4 

4.7 

7.4 

2.0 

6.5 

5.75 

1.7 

8.7 

3.9 

tation  satisfies  the  timing  for  dc  bias  voltage  above  4. 1  mV.  With  one  extra  JTL  inserted  at  the  out¬ 
put,  the  timing  requirement  is  satisfied  for  the  entire  dc  bias  range. 


Fig.  4.15  shows  the  two-dimensional  operation  range  simulation  results  of  three  stages  of  cas¬ 
caded  SRs  at  50  GFlz.  The  maximum  achievable  dc  bias  margins  are  (4.55  mV,  7.3  mV),  (-20.9%, 
27.0%),  which  is  much  smaller  than  that  of  one  stage  SR  (3.9  mV,  7.4  mV),  (-32.2%,  28.7%).  It 
does  not  improve  with  one  stage  JTL  inserted  at  SR  output.  It  means  timing  violation  is  not  the 
reason  for  the  circuit  failure  at  50  GHz  at  the  low  dc  bias  voltage.  The  interaction  and  interference 
among  the  clock  pulses  and  data  pulses  could  be  the  main  reason  for  the  failure.  At  the  low  dc  bias 


Dc  bias  voltage  (mV) 


— x— SR  input  delay 
upper  boundary 

— ■ — SR  output  delay  w / 
1  extra  JT  L  sta g  e 

— ♦ —  SR  output  delay 

— * —  SR  input  delay 
lo  we  r  b  o  u  n  d  a  ry 


Figure  4.14  Timing  at  the  input  of  the  2nd  and  3rd  SR  in  the  DDST  shift  register  at  50  GHz. 


Chapter  4:  50  GHz  On-Chip  Testing  System 


115 


El  pass  El  fail 


17.5 


a, 

a 

5 

"S 

Q 

o 


o 

o 

U 


Tj 

Q 


-7.5 


►2< 

•24 

►24 

►7< 

►74 

►7* 

►74 

►74 

►2< 

►74 

►74 

►7 

►74 

►74 

►7< 

►74 

►7" 

►T< 

►74 

►71 

►74 

►74 

►7> 

►74 

►74 

►7« 

►74 

►74 

ft 

* 

►2< 

►74 

►24 

►7" 

►74 

►74  ►] 

[<►74 

►74 

►2< 

►74 

►2« 

►74 

►24 

►7 

►24 

►74  ►: 

:<* 

►24 

►2< 

►24 

►24 

►74 

►24 

►7 

►24 

►24 

►2< 

►24 

►24 

►7« 

►74 

•74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►7* 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

•74 

►74 

►74 

►74 

►74 

►74 

►74*2 

4  >74 

►74 

►74 

►74 

►7< 

►74 

►74 

►74 

►74 

►74  ►. 

4  >74 

►74 

►74 

►74 

►7< 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►7* 

►24 

►74 

►74 

►T< 

►7< 

►74 

►7i 

►74 

►74 

►74 

►74 

►74 

►74 

►7< 

►2< 

►74 

►74 

►74 

►7* 

►74 

►74 

►7< 

►74 

►74 

►7* 

►74 

►74 

■74 

►7* 

►7< 

►74 

►74 

►7* 

4* 

►74 

►7* 

* 

►2< 

►7< 

►24 

►74 

►74 

►24  ►' 

4  >7< 

►24 

►74 

►74 

►24 

►74 

►24 

*74 

►7* 

►24 

*7< 

►7* 

►24 

►7< 

►7< 

►74 

►14 

►7 

*74 

►74 

►7 

•74 

►74 

►7< 

•74 

►74 

►7 

►74 

►74 

►74 

►74 

►74 

*74 

►74 

►7« 

•74 

►74 

►7 

•74 

►74 

►74 

►74 

►7^ 

74 

►74 

►7^ 

►74 

►74  ►' 

<►74 

►14 

►7* 

►74 

►7< 

►74 

►74 

►7« 

►74 

►74  ► 

'<►24 

►74 

►2< 

►24 

►24 

►74 

►24 

►2< 

►74 

►74 

►7< 

►74 

►74 

£4 

iji 

►7< 

!! 

!! 

►2* 

►24 

►74 

►24 

►74 

►74 

►24 

►24 

►74 

►24 

* 

* 

►74 

►74 

►7* 

►24 

►74 

►24 

►24 

►74 

►24 

►24 

* 

►74 

►24 

►74 

►74 

►24 

►7< 

►74  ►: 

4  £4 

►74 

►24 

►24 

►74 

* 

►24 

►2< 

►74  ►: 

4* 

►T* 

►24 

►74 

►24 

►74 

►74 

►24 

►2« 

►74 

►74 

►2< 

►74 

►T« 

* 

►24 

:: 

11 

a 

►7* 

►74 

►7i 

►74 

►74 

►74 

►74 

►74 

►74 

•74 

►74 

►74 

►74 

►74 

►7* 

►74 

►74 

►7< 

►74 

►7< 

•74 

►74 

►7* 

* 

►74 

►74 

►74 

►7* 

►74  ►: 

4* 

►74 

►7< 

►24 

►2< 

►74 

►24 

►2< 

►7< 

*►: 

4* 

►24 

►7* 

►24 

►24 

►7< 

►24 

►7 

a 

►24 

►7* 

►7* 

►24 

►2« 

>:< 

* 

:: 

11 

a 

a 

a 

►7" 

* 

* 

a 

31 

►74 

►74 

•74 

►7* 

►74 

►74 

►74 

►74 

►74 

£4 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►7 

►74 

►:< 

►:< 

•:< 

*  ►' 

<►74 

►74 

►74 

►74 

►7* 

►74 

►:< 

a 

31 

333 

331 

IS 

a 

31 

►74 

►74 

•:< 

a 

a 

►74 

►7< 

•:< 

►74 

|M( 

!! 

!! 

!! 

!! 

!! 

Si 

a 

a 

►74 

a 

3! 

!3 

►24 

►T4 

►2< 

►24 

►74 

►74 

►24 

►14 

►24 

►24 

►74 

►24 

►74 

►I< 

►24 

►24 

3! 

3! 

1: 

3! 

3! 

►74  ►] 

4* 

►24 

►24 

►74 

3! 

i: 

1! 

1! 

111 

3E! 

!l 

1! 

11 

►74 

1! 

1! 

1! 

1! 

►74 

►24 

►2« 

►74 

>7< 

U 

:: 

11 

a 

a 

a 

a 

a 

►74 

a 

a 

a 

a 

74 

►74 

a 

►74 

►7« 

►74 

a 

a 

■7.4 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

3* 

►24 

►7< 

►74 

►2< 

a 

a 

:: 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►2< 

►!< 

►24 

►2< 

>:< 

a 

33 

11 

a 

a 

31 

a 

31 

a 

a 

a 

a 

a 

31 

a 

a 

a 

a 

a 

a 

a 

►74 

13 

a 

31 

a 

a 

a 

a 

a 

a 

a 

a 

333 

331 

a 

a 

31 

a 

31 

a 

a 

31 

333 

331 

IS 

a 

31 

a 

a 

a 

a 

a 

►:< 

►7< 

►74 

►74 

l*t 

!! 

Ei 

;; 

3i 

:: 

Ei 

3; 

Ei 

i! 

Bi 

Ei 

Ei 

;; 

Ei 

Ei 

;; 

►T4 

:: 

Ei 

■3 

ii 

Ei 

EE 

Ei 

3; 

Ei 

Si 

333 

3  Ei 

il 

ii 

;; 

ii 

Ei 

3; 

;; 

Si 

333 

3  Ei 

ES 

ii 

Ei 

:s 

ii 

Ei 

EE 

Ei 

►24 

►24 

►24 

►1< 

* 

U 

:: 

11 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

13 

a 

a 

a 

a 

a 

a 

a 

►24 

►2< 

►24 

►24 

:: 

□ 

11 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

ia 

a 

a 

a 

a 

a 

a 

a 

a 

333 

331 

S 

a 

a 

a 

a 

a 

a 

a 

►24 

►2< 

►2« 

►24 

►I" 

* 

3933 DO 333333  333333  333933 DC 3333 33  3333 33  3339  33  3339  33  33 33  3333 33  3333 33  3339  33 DC 3333  33  3333  33  3339  33  S3  39  33  S3  33  S3**** 
□□  DDDDDDDDDDDDB  E3E3E3E3E3E3E3E9E3i3E9E3E3E3E3E3E3E3E3E3E339E3E3E3  33  E3  33  33 E3  33  33 39  33  33  39  33  33  33  ES**** 
E9  33  3333  E3E3  33  E3E3  33  333933  3333  E3E3  33  E3E3  33  333933  333933  3333  E3E3  33  E3E3  33  333933  3333  E3E3  33  33  E3  33  33  3933  E3  39  33  3333  33**  ** 

►7* 

►7< 

B 

a 

11 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

" 

a 

a 

a 

a 

a 

a 

a 

►24 

►7" 

►7< 

►24 

►:< 

>:< 

:: 

:: 

B 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►74 

►7 

•74 

►74 

►ji 

:: 

!! 

:: 

!! 

a 

a 

!! 

a 

:: 

!! 

s: 

:: 

:: 

3! 

:: 

33 

31 

:: 

3! 

:: 

3! 

31 

13 

1! 

31 

1! 

1! 

3! 

3! 

31 

1! 

3! 

31 

131 

3E! 

31 

3! 

31 

31 

31 

1: 

:: 

3! 

111 

3E! 

11 

:: 

ll 

13 

1! 

1! 

1! 

1! 

►74 

►24 

►24 

►74 

►7* 

* 

B 

13 

11 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►2 

►7* 

►24 

•2* 

>:< 

:: 

13 

i: 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

3! 

a 

a 

a 

a 

a 

31 

a 

a 

a 

a 

a 

a 

a 

a 

333 

331 

a 

a 

a 

a 

31 

a 

a 

a 

333 

331 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►7< 

►74 

►74 

13 

13 

Ei 

3; 

EE 

Ei 

Ei 

Ei 

Ei 

ii 

ii 

Si 

E3 

Ei 

:: 

;; 

Ei 

:: 

Ei 

Si 

ii 

Ei 

Ei 

!3 

ii 

;; 

:: 

Ei 

Ei 

Ei 

333 

3  Ei 

:i 

ii 

;; 

ES 

Ei 

3i 

Ei 

333 

33; 

ES 

ii 

;; 

ii 

ii 

Ei 

Ei 

Ei 

►74 

►24 

►2< 

►’4 

*Z* 

a 

B 

11 

ii 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►2< 

•2.4 

►24 

►2< 

>:< 

□ 

13 

ii 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►2« 

>2< 

►24 

►I< 

►I« 

*393333333393333393  33  333933  DC  3333333333333339333339333333333333333333333933  DC  ODD  333333333933333933333333**** 
□DaDDDODDQQDDDDCDDE3DDDDC3DDaDDDDDDDC3DE3DDDEaQDDDDaaDE3QDDBD»>:":« 
□QOQDDDDDaQQDDDEDDQQDDDDDDQDDDDaDDQQDDDDEDQQQDDDQDQQQDDQiXoX* 

►7< 

►7< 

* 

□QDDDDDDaaQDDDEDDQEDaaaDDQDDDDaaDQQDDDDEDaQDDDDDDDQaaDa»X»X« 

□□DDDODQDDDDBDEDQQDDDOQDDQDDDDDDDDQDDDDEDDDDDQODDEQDDDDOtMoK 

►7< 

►J 

s 

[J 

n 

a 

a 

a 

a 

a 

a 

a 

a 

!3 

n 

31 

33 

3! 

:: 

a 

a 

a 

I! 

31 

33 

33 

a 

a 

a 

31 

a 

a 

13 

I! 

a 

131 

ill 

a 

31 

31 

33 

a 

a 

a 

31 

111 

in 

13 

a 

a 

a 

a 

a 

11 

11 

►74 

►24 

►2< 

►74 

>Z< 

* 

* 

13 

ii 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

>7< 

►7* 

►24 

►2< 

* 

►:< 

:: 

n 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

ia 

a 

a 

a 

a 

a 

a 

a 

a 

333 

ia 

a 

a 

a 

a 

a 

a 

a 

a 

►74 

►7 

•74 

►74 

►l< 

!! 

1! 

!! 

a 

a 

a 

a 

a 

!! 

3! 

:: 

:: 

5! 

33 

3! 

:: 

:: 

Ei 

3! 

3! 

31 

1! 

:: 

31 

3! 

1! 

3! 

3! 

31 

1! 

33 

31 

131 

31! 

33 

3! 

31 

3! 

31 

1: 

:: 

El 

111 

31! 

11 

:: 

ll 

l! 

1! 

1! 

11 

1! 

►74 

►24 

►24 

►74 

* 

►:< 

13 

11 

a 

a 

a 

a 

a 

a 

a 

31 

a 

a 

s: 

a 

3! 

31 

a 

a 

a 

a 

31 

a 

a 

a 

a 

a 

31 

a 

31 

a 

a 

31 

333 

331 

a 

a 

31 

S3 

31 

a 

a 

31 

333 

331 

a 

a 

31 

a 

a 

31 

a 

a 

►24 

►2< 

►7< 

►24 

►2? 

* 

ijl 

►24 

►H 

* 

►2« 

□ 

n 

□ 

n 

□ 

33 

S3 

E3 

33 

33 

33 

33 

E3 

E3 

33 

33 

33 

33 

33 

33 

□ 

33 

33 

33 

a 

33 

a 

a 

a 

S3 

□ 

33 

□ 

33 

S3 

S3 

□ 

33 

a 

33 

□ 

33 

□ 

33 

33 

33 

a 

ES 

□ 

33 

33 

33 

33 

33 

33 

33 

S3 

S3 

33 

33 

33 

33 

E33 

E33 

333 

333 

33 

EE 

□ 

33 

□ 

33 

a 

33 

a 

a 

a 

E3 

□ 

33 

33 

33 

ESS 

E33 

3  33 
333 

33 

39 

□ 

a 

□ 

33 

33 

39 

□ 

33 

33 

33 

□ 

33 

33 

33 

►24 

►74 

►2< 

►71 

►2? 

►71 

►24 

►24 

►I" 

* 

►:< 

►74 

11 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

ia 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►2< 

►24 

►24 

iji 

i^l 

* 

»7< 

S3 

33 

ss 

33 

33 

S3 

33 

33 

33 

33 

33 

S3 

ee 

33 

33 

33 

33 

S3 

33 

□ 

33 

33 

33 

a 

a 

a 

a 

a 

a 

□ 

33 

13 

33 

S3 

S3 

S3 

33 

a 

33 

33 

33 

33 

33 

33 

33 

33 

33 

□ 

33 

S3 

33 

33 

33 

33 

33 

S3 

EE 

33 

33 

33 

33 

E33 

E33 

333 

3E3 

33 

39 

33 

33 

33 

33 

a 

a 

a 

33 

a 

a 

□ 

1: 

33 

33 

ESS 

E33 

333 

333 

33 

39 

a 

33 

□ 

33 

33 

39 

33 

33 

33 

33 

33 

33 

33 

33 

►24 

►24 

►2.4 

►71 

►2.4 

►74 

►24 

►24 

►1< 

•:< 

►24 

►2< 

•74 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►24 

►2< 

•2.4 

►24 

►2< 

►7< 

* 

►14 

* 

►14 

►74 

u 

£4 

33 

33 

33 

33 

33 

33 

33 

33 

S3 

S3 

33 

33 

33 

33 

33 

33 

33 

□ 

33 

33 

a 

a 

a 

a 

a 

a 

33 

□ 

33 

□ 

S3 

S3 

S3 

□ 

a 

a 

33 

33 

33 

□ 

33 

33 

33 

33 

33 

33 

33 

33 

33 

33 

33 

33 

S3 

S3 

33 

33 

33 

33 

333 

333 

333 
3  33 

as 

33 

33 

33 

33 

33 

a 

a 

a 

a 

a 

a 

33 

□ 

33 

33 

ESS 

ESS 

333 
3  33 

33 

33 

ES 

□ 

33 

□ 

as 

33 

33 

□ 

33 

33 

33 

33 

33 

33 

►24 

►24 

►2? 

>T« 

►2.4 

►7< 

►24 

►24 

►74 

►14 

•74 

►7< 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

333 

sa 

a 

a 

a 

a 

a 

a 

a 

a 

►74 

►7 

•74 

►74 

►*« 

►7« 

*1* 

* 

•24 

►74 

►74 

»x< 

►T4 

>1* 

►2< 

►*4 

►74 

►74 

* 

►2< 

►74 

£4 

►7< 

13 

33 

33 

►I* 

33 

33 

□ 

S3 

33 

►2.4 

►74 

u 

□ 

►74 

►74 

33 

33 

33 

►74 

33 

33 

a 

u 

□ 

►74 

►74 

33 

►74 

►74 

►T*1 

33 

33 

* 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

33 

□ 

□ 

33 

33 

□ 

□ 

33 

S3 

S3 

S3 

S3 

33 

□ 

□ 

33 

a 

a 

a 

33 

u 

33 

33 

Ei 

33 

33 

33 

33 

13 

33 

33 

33 

13 

33 

33 

33 

13 

33 

□ 

33 

13 

33 

33 

S3 

13 

33 

33 

33 

13 

33 

33 

33 

u 

□ 

►74 

►74 

13 

33 

33 

11 

13 

33 

33 

33 

Ell 

333 

E33 

ei>: 

313 

333 

333 

u 

□ 

□ 

>T< 

13 

□ 

□ 

►71 

13 

33 

□ 

a 

a 

a 

13 

a 

a 

►74 

u 

□ 

►74 

►14 

13 

□ 

►74 

S3 

33 

►I4 

>7< 

Ul 

►24C 

►74  ►: 
*►: 

113 

333 

>2< 

u 

□ 

►74 

►74 

13 

□ 

►2.4 

►71 

u 

□ 

►24 

►74 

u 

□ 

►74 

►7< 

13 

□ 

►2.4 

►74 

u 

□ 

►24 

►74 

u 

►2< 

►2< 

►74 

13 

►7< 

►2.4 

►T1 

►74 

►24 

►24 

►74 

►74 

►2< 

►2< 

►71 

►7< 

►2.4 

►71 

►74 

►14 

►24 

►24 

* 

►74 

►74 

•74 

* 

►7» 

►74 

►7* 

* 

►74 

►7 

•74 

►74 

►74 

a 

►7*  ►!< 

a 

a 

a 

a 

►7« 

a 

a 

* 

►74 

►74 

►74 

►74 

►7* 

►74 

►7< 

►74 

»2 

<* 

►24 

►74 

* 

►2« 

►74 

►24 

* 

►74 

►24  ► 

.-24 

►74 

►74 

►24 

►24 

►7* 

►24 

►7 

►7« 

►24 

►7 

►24 

►24 

•74 

►74 

►74 

•74 

►7< 

>74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74 

►74  *2< 

►74 

►7« 

►74 

►74 

►74 

►7< 

►74 

►74 

►74 

►74 

►74 

•74 

►7 

►74 

►74 

►74 

►74 

►74  ►: 

4>7< 

►74 

►74 

►74 

►7< 

►74 

►74 

►7< 

►74 

►74  ►[ 

4  »7< 

►74 

►74 

►74 

►7< 

►74 

►74 

►7< 

►74 

►74 

►74 

►24 

►74 

►J 

►2< 

*:< 

►74 

►2< 

►2* 

►24 

* 

►2< 

►74 

►74 

►24 

►24 

* 

►24 

►24 

►T4 1*74 

►24 

►2< 

►74 

* 

►24 

►24 

* 

►24 

►24 

►24 

►24 

►74 

►2< 

►74 

* 

►24 

►74 

*•: 

4  >24 

* 

►74 

►24 

►7* 

►74 

►74 

►7* 

►74 

►:<>' 

4>2< 

►:< 

►24 

►2< 

►24 

►24 

►74 

►24 

►74 

►:< 

►24 

►74 

>:< 

►j 

•24 

►74 

►74 

•74 

►7 

►2« 

►74 

•74 

►74 

►7« 

►74 

►74 

* 

•74 

►74  >!< 

►74 

►7* 

* 

►74 

►7« 

* 

►74 

►74 

►74 

►74 

►7< 

7.4 

74 

►74 

►7 

►74 

►74  ►' 

.«* 

►24 

* 

►74 

►7< 

►74 

►24 

•74 

►24  ► 

[-24 

►24 

►7 

►24 

►24 

►7< 

►24 

►2< 

►24 

►24 

►7 

>24 

►24 

4.5  7.5 


Dc  bias  voltage  (mV) 


Figure  4.15  Two-dimensional  operation  range  of  3-stage  cascaded  SRs  at  50  GHz. 


voltage,  the  junctions  switch  slower  and  the  SFQ  pulses  start  to  smear  out  and  interact  with  each 
other.  An  RSFQ  digital  gate  such  as  an  SR  shows  some  analog  nature.  Its  inputs  and  outputs  do  not 
have  strict  isolation.  When  multiple  gates  are  put  together,  the  dc  bias  margins  are  further  reduced 
due  to  the  interference  among  the  signal  pulses  at  50  GHz.  This  is  the  bottleneck  for  the  lower  end 
dc  bias  margin  for  the  entire  DDST  shift  register.  So  the  timing  improvement  of  the  front  stage  and 
SR  is  not  necessary. 
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Some  previous  shift  register  design  works  were  studied  as  references.  [58][59][60][61]. 

4.3.3  D  Flip-Flop 

Fig.  4.16  shows  the  post-layout  circuit  schematics  of  the  D  flip-flop.  This  is  the  most  difficult 
circuit  block  in  the  shift  register  due  to  the  multiple  junction- inductance  loops  involved  to  recover 
the  dual  outputs.  The  detailed  operation  of  this  circuit  was  discussed  already  in  Section  1.3.2.  Each 
incoming  data  pulse  sets  the  internal  state  of  the  D  flip-flop.  The  clock  pulse  resets  the  flip-flop 
and  generates  a  pair  of  complimentary  outputs.  The  pre-layout  simulation  with  optimized  circuit 
parameters,  not  including  the  parasitic  inductances  can  achieve  (-29%,  29%)  dc  bias  margins. 
Flowever,  due  to  the  complicated  loops,  with  the  parasitic  inductances,  the  dc  bias  margin  based  on 
the  original  circuit  parameters  drops  dramatically.  Reoptimization  is  necessary.  Since  Wins  can  not 
model  such  complicated  parasitic  effects,  the  re-optimization  was  done  manually.  The  parameters 
shown  in  Fig.  4.16  are  the  results  of  the  reoptimization. 

Fig.  4.17  shows  the  two-dimensional  operation  range  of  the  D  flip-flop  at  50  GFlz.  The  maxi¬ 
mum  achievable  dc  bias  margins  are  (4.5  mV,  6.65  mV),  (-21.7%,  15.7%),  a  substantial  decrease 
from  the  pre-layout  simulation  results. 

Fig.  4.18  compares  the  output  clock-to-data  delay  of  the  SR  with  the  timing  requirement  at  the 
input  of  the  D  flip-flop.  The  current  SR  implementation  satisfies  the  input  timing  requirement  in 
the  D  flip-flop’s  entire  operable  dc  bias  range.  Removing  a  half  stage  of  JTL  from  the  data  input  of 
the  D  flip-flop  may  improve  the  timing  margin  further. 
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Figure  4.16  Post-layout  schematics  of  (a)  djtl  and  (b)  the  D  flip-flop  in  the  DDST  shift  register. 
L-,  =  4.5  pH,  l_2  =  L3  =  2.3  pH  in  djtl-,.  L1  =  4.635  pH,  L2  =  L3  =  2.33  pH  in  djtl2. 


4.3.4  4-bit  DDST  Shift  Register 

A  4-bit  DDST  shift  register  can  be  built  from  the  blocks  discussed  above.  The  block  diagram 
was  shown  in  Fig.  4.8.  The  operation  was  explained  at  the  beginning  of  Section  4.3.  It  is  a  self- 
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timed  circuit  with  internal  synchronous  blocks.  For  the  clock  distribution  inside  the  shift  register, 
the  concurrent  timing  strategy  is  used,  i.e.,  data  and  clock  flow  in  the  same  direction.  Compared  to 
the  counter-current  timing,  where  data  and  clock  flow  at  opposite  direction,  concurrent  timing  is 
more  favorable  for  high-speed  operation  since  the  delay  along  the  data  path  is  partially  matched 
with  the  delay  along  the  clock  path.  With  this  strategy  and  careful  timing  control  of  each  stage,  the 
correct  functioning  of  the  4-bit  DDST  shift  register  at  50  GHz  is  achieved.  Fig.  4.19  shows  the 
simulation  waveforms  of  the  50  GHz  operation  of  the  4-bit  DDST  shift  register.  In/In  and  Out/Out 
are  the  complementary  inputs  and  outputs  of  the  DDST  shift  register.  D  j  and  CUq  are  the  data  and 
clock  inputs  to  the  1st  SR.  D4  and  Clk4  are  the  data  and  clock  inputs  to  the  D  flip-flop.  The  CLK4 
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Figure  4.17  Two-dimensional  operation  range  of  the  D  flip-flop  at  50  GHz. 
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Figure  4.18  Timing  at  the  input  of  the  D  flip-flop  in  the  DDST  shift  register  at  50  GHz. 
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Figure  4.19  Simulation  waveforms  of  the  4-bit  DDST  shift  register  with  50  GHz  operations 
at  nominal  dc  bias  voltage  5.75  mV. 
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Figure  4.20  Simulation  waveforms  of  two  cascaded  4-bit  shift  registers  with  50  GHz 
operations  at  nominal  dc  bias  voltage  5.75  mV. 
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pulse  ringing  is  the  effect  which  limits  the  lower-end  dc  bias  margin  of  the  4-bit  DDST  shift  regis¬ 
ter.  Out/Out  are  the  delayed  versions  of  In/In  with  a  4-clock-cycle  latency.  One  virtue  of  the  circuit 
is  that  the  data-clock  relative  delay  variation  will  not  accumulate  over  stages  since  each  stage  is 
clocked,  which  is  useful  to  combat  process  variations.  The  dc  bias  margins  of  the  4-bit  DDST  shift 
register  are  (4.7  mV,  6.65  mV),  (-18.3%,  15.7%). 

An  8-bit  DDST  shift  register  can  be  easily  constructed  from  two  cascaded  4-bit  DDST  shift 
registers.  Fig.  4.20  shows  the  simulation  waveforms  of  two  cascaded  4-bit  DDST  shift  registers 
with  50  GHz  operation.  In/In  are  the  complementary  inputs.  Out'/Out'  are  the  outputs  of  the  1st 
DDST  shift  register.  Out/Out  are  the  outputs  of  the  2nd  DDST  shift  register.  Out/Out  are  the 
delayed  version  of  In/In  with  a  8-clock-cycle  latency.  The  dc  bias  margins  are  (4.75  mV,  6.65  mV), 


(-17.4%,  15.7%). 
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Table  4-4  lists  the  dc  bias  margin  of  the  individual  blocks,  the  4-bit  shift  register,  2-stage  cas¬ 
caded  4-bit  shift  registers  and  the  whole  testing  system  which  will  be  discussed  in  the  next  section. 


TABLE  4-4  Summary  of  the  dc  bias  margin  of  the  4-bit  DDST  shift  register  and  its 
components  at  50  GHz. 


Circuit 

dc  bias  margin 

front  stage 

(4.6  mV,  7.6  mV) 

(-20%,  32.2%) 

1SR 

(4.2  mV,  7.4  mV) 

(-27.0%,  28.7%) 

3  SRs 

(4.7  mV,  7.3  mV) 

(-18.3%, 27.0%) 

D  flip-flop 

(4.5  mV,  6.65  mV) 

(-21.7%, 15.7%) 

4-bit  DDST  shift  register 

(4.7  mV,  6.65  mV) 

(-18.3%,  15.7%) 

Two  4-bit  DDST  shift  registers 

(4.75  mV,  6.65  mV) 

(-17.4%,  15.7%) 

whole  testing  system  w/o  DUT 

(4.3  mV,  6.65  mV) 

(-25.2%,  15.7%) 

Comparing  the  dc  bias  margin  of  the  4-bit  DDST  shift  register  with  that  of  the  individual 
blocks,  we  can  see  the  upper  margin  is  limited  by  the  D  flip-flop  and  the  lower  margin  is  limited  by 
SFQ  pulse  interaction  in  the  3-stage  cascaded  SRs.  It  would  be  hard  to  build  an  8-bit  DDST  shift 
register  from  7  SRs  and  1  D  flip-flop  while  maintaining  the  dc  bias  lower-end  margin  since  the 
interaction  would  be  worse.  However,  if  the  8-bit  DDST  shift  register  is  built  from  two  cascaded 
4-bit  DDST  shift  registers,  the  dc  bias  margin  remains  almost  the  same  as  for  the  single  4-bit 
DDST  shift  register. 


4.4  Whole  System 

Shown  in  Fig.  4.21  is  the  block  diagram  of  the  whole  testing  system  without  the  DUT.  It 
mainly  consists  of  a  4-bit  pulse  generator,  two  4-bit  DDST  shift  registers,  a  CB  and  JTLs  between 
the  blocks.  The  CB  combines  the  on-chip  high-speed  clock  Clk  hs  and  In  data  to  feed  the  input  In' 
of  the  following  DDST  shift  register,  while  data  In  propagates  through  a  series  of  JTLs  to  the  input 
In'  of  the  DDST  shift  register.  The  delay  of  the  In  path  and  that  of  In  path  are  balanced. 
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The  testing  system  can  be  verified  in  different  ways.  The  low-speed  function  of  the  two  DDST 
shift  registers  can  be  verified  by  muting  the  pulse  generator.  Fed  with  complementary  data  at  In/In 
from  a  pattern  generator,  the  DDST  shift  registers  can  be  tested  from  1  kHz  to  a  few  gigahertz.  For 
testing  above  20  GFlz,  the  pattern  generator  is  programmed  to  assert  the  trigger  signal  in  between 
low-speed  In/In  data  sets.  So  four  consecutive  high-speed  pulses  are  generated  and  merged  to  In'. 
Those  push  the  4-bit  data  stored  in  the  input  DDST  shift  register  to  transfer  to  the  output  DDST 
shift  register  at  high-speed.  The  results  in  the  output  DDST  shift  register  can  be  read  out  at  low- 
speed  by  feeding  the  next  input  data  pattern.  That  simultaneously  resets  the  output  DDST  shift  reg¬ 
ister  to  all  “0”s. 


Fig.  4.22  shows  the  simulation  waveforms  of  the  testing  system  with  the  mixed  50  GHz  and  20 
GHz  operation.  20  GHz  is  chosen  instead  of  a  very  low  speed  such  as  1  kHz,  which  is  often  used  in 
the  lab  testing,  to  save  simulation  time.  Three  sets  of  20  GHz  complementary  data  “1  1  1  1”,  “0  1  0 
1”,  “0  0  0  0”  are  fed  through  In/In.  Two  trigger  pulses  are  programmed  between  the  three  data  sets. 
Each  trigger  pulse  produces  four  50  GHz  clock  pulses  at  Clkhs.  As  the  signals  propagate,  In'  is 
simply  a  delayed  version  of  In.  In'  is  the  merge  of  In  and  Clk  hs.  The  first  set  of  data  T  1  1  1”  is 
loaded  into  the  input  shift  register  at  20  GHz.  When  the  four  50  GHz  clock  pulses  arrive  at  In',  the 
Trigger 


Figure  4.21  A  block  diagram  of  the  DDST  on-chip  high-speed  testing  system  w/oDUT. 
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dataset  “111  1”  is  pushed  to  the  output  shift  register  at  50  GHz.  When  the  second  set  of  data  “0  1 
0  1”  is  loaded  into  the  input  shift  register,  the  first  set  of  data  is  shifted  out  at  Out/Out  at  20  GHz. 
There  is  a  eight-clock-cycle  latency  from  In'/In'  to  Out/Out  independent  of  the  clock  rate.  In  turn, 
the  second  burst  of  high-speed  clock  pulses  pushes  the  second  set  of  data  to  the  output  shift  regis¬ 
ter  at  50  GHz.  The  third  set  of  low-speed  data  pushes  the  second  set  of  data  to  the  Out/Out  at  20 
GHz.  Overall,  Out/Out  is  the  delayed  version  of  In'/In'  with  an  8-clock-cycle  latency.  In  laboratory 
testing,  1  kHz  data  instead  of  20  GHz  data  are  usually  programmed  in  a  pattern  generator.  The  50 
GHz  burst  at  Out  can’t  get  off  chip  due  to  the  limited  bandwidth.  So  only  the  1kHz  transitions  can 
be  observed  on  the  oscilloscope.  By  verifying  the  correct  1  kHz  output,  we  can  infer  the  high¬ 
speed  operation  in  between  is  correct.  The  simulated  dc  bias  margins  of  the  whole  testing  system 
are  (4.3  mV,  6.65  mV),  (-25.2%,  15.7%).  The  reason  why  the  whole  testing  system  has  an  wider 
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Figure  4.22  Simulation  waveforms  of  the  high-speed  testing  system  with  mixed  50  GHz 
and  20  GHz  operation  at  the  nominal  dc  bias  voltage  5.75  mV. 
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Figure  4.23  A  micrograph  of  a  50  GHz  testing  system  in  6.5  kA/cm2  process. 


lower-end  dc  bias  margin  than  that  of  the  4-bit  DDST  shift  register  is  that  only  4  cycles  of  consec¬ 
utive  50  GHz  operations  are  required  in  between  the  20  GHz  operations,  which  relaxes  the  inter¬ 
ference  between  the  high-speed  SFQ  pulses. 

Fig.  4.23  shows  a  micrograph  of  the  test  system  for  6.5  kA/cm  process.  DC/SFQ  and 
SFQ/DC  converters  are  added  as  the  interface  circuits.  A  separate  dc  bias  is  applied  on  the  pulse 
generator  to  be  able  to  control  the  speed  of  the  clock  pulses  independently.  This  test  chip  was  not 
tested  due  to  the  failure  of  the  fabrication  process. 

But  recently,  a  similar  test  system  was  implemented  by  others  using  the  NEC  Nb  process  and 
was  verified  successfully  up  to  50  GHz  [62], 
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CHAPTER  5 

Test  Results 


5.1  Testing  Setup 

5.1.1  Special  Considerations 

Testing  superconductor  circuits  has  some  special  considerations.  First,  it  requires  cooling. 
Chips  are  mounted  inside  a  probe  head  and  immersed  in  the  liquid  helium  to  be  cooled  to  4.2  K. 
The  cables  inside  the  probe  body  connect  the  signal  pads  inside  the  probe  head  to  the  BNC  or 
SMA  connectors  on  the  other  end  of  the  probe  for  testing. 

Second,  superconductor  circuits  are  very  sensitive  to  flux  trapping.  The  trapped  flux  is  accom¬ 
panied  by  a  circulating  current  in  the  superconductor  loop.  Existence  of  stray  magnetic  field  dur¬ 
ing  the  circuit  cooling  to  the  superconductor  state  or  applying  large  trantient  current  can  cause  flux 
trapping.  There  are  several  ways  to  combat  this  issue.  A  double  layer  magnetic  shield  is  applied 
enclosing  the  probe  head  to  prevent  the  earth  magnetic  field  entering  the  chip.  Another  layer  mag¬ 
netic  shield  is  built-in  with  the  liquid  helium  dewar  used  for  this  work.  All  the  shields  need  to  be 
deguassed  to  remove  the  residual  magnetic  field  from  the  shields  themselves.  The  degaussing  of 


the  cylinder  shield  for  the  probe  head  can  be  done  using  an  external  deguasser.  With  the  deguasser 
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turned  on,  drag  the  cylinder  shield  through  the  center  of  the  deguasser  coils  and  slowly  move  away 
from  the  deguasser  until  the  field  is  weak  enough.  For  the  inner  layer  of  the  double  layer  shield,  the 
degaussing  is  done  in-situ  with  the  existence  of  the  outer  shield.  Coils  are  wrapped  around  the 
inner  shield.  Exponentially  decaying  ac  current  is  supplied  to  coils  to  generate  a  decaying  mag¬ 
netic  field  for  degaussing.  With  proper  degaussing,  the  magnetic  field  can  be  reduced  to  about  1 
mG  level  inside  the  double  shield.  Degaussing  needs  to  be  done  before  the  chip  is  cooled.  External 
cable  connections  should  be  done  before  cooling  to  avoid  unnecessary  current  spikes.  There  is  a 
big  blue  dewar  in  our  laboratory.  The  magnetic  shield  is  wrapped  with  coils.  With  proper  degauss¬ 
ing,  the  magnetic  field  can  be  reduced  to  about  1  pG  in  the  sweet  spot.  The  sweet  spot  range  is 
about  10  inch  along  the  vertial  axis.  That  small  range  and  the  fast  evaporation  of  the  liquid  helium 
in  this  dewar  make  it  not  very  useful  practically.  The  magnetic  shield  in  other  dewars  used  for  this 
project  can  not  be  degauseed  in-situ.  The  testing  doesn’t  show  better  results  or  less  flux  trapping 
with  the  big  blue  dewar.  With  all  the  effort,  flux  trapping  is  still  unavoidable  from  time  to  time. 
Once  it  is  trapped,  the  only  way  to  remove  it  is  to  heat  the  chip  or  lift  the  probe  out  of  helium  for 
the  chip  to  warm  up  by  itself  to  return  to  normal  conducting  state.  Adding  moats  (slots  cut  from 
ground  planes)  surrounding  circuits  on-die  proved  an  effective  approach  [63].  For  a  5  mm  x  5mm 
chip,  1  mG  magnetic  field,  BA/O0  =  1  mG  x  5  mm  x  5  mm  /  (20.7  G  pm2)  =  1208.  That  is  one  flux 
quantum  for  every  20,695  pm  ,  or  144  pm  x  144  pm.  The  area  enclosed  and  protected  by  each 
moat  should  be  smaller  than  this  value. 

Third,  electrical  shielding  and  impedance  matching  are  very  important  to  measure  the  high- 
frequency  low-voltage  signals.  Two  kinds  of  probes  are  used  in  our  testing,  low-speed  probe  and 
high-speed  probe.  The  low-speed  probe  has  40  signal  pads  and  four  ground  pads.  The  40  signal 
pads  are  connected  to  the  centers  of  the  40  BNC  connectors.  The  four  ground  pads  are  connected 
to  the  BNC  connector  grounds  and  also  connected  to  the  metal  shield  covering  the  signal  wires 
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Figure  5.1  The  equipment  setup  for  the  low-speed  testing  experiment. 

inside  the  probe  body.  The  high-speed  probe  has  24  signal  pads.  The  24  signal  pads  are  connected 
to  the  centers  of  the  24  SMA  connectors  on  the  other  end.  For  each  signal  line,  it  has  its  own 
ground  shielding  to  form  50  Q  impedance  transmission  line.  On  the  probe  head,  co-planar  wave 
guide  layout  is  done  to  keep  50  Q  impedance  matching. 

5.1.2  Low-Speed  Testing  Setup 

Fig.  5.1  shows  a  typical  low-speed  testing  setup.  The  input  data  patterns  are  programmed  and 
generated  by  HP  8175A  digital  signal  generator.  The  signal  amplitude  and  offset  can  be  further 
adjusted  by  the  attenuator  and  level  shifter  to  meet  the  requirement  of  the  DC/SFQ  circuit  on-die. 
The  dc  power  supply  sets  the  test  chip  bias  voltages.  Output  waveforms  typically  of  100  pV  ampli¬ 
tude  are  observed  by  a  Tektronix  7854  oscilloscope.  A  sync  signal  is  sent  from  the  signal  generator 
to  the  oscilloscope  as  the  trigger  signal.  The  low-speed  signal  data  rate  is  in  the  range  of  1  kHz  to  a 
few  hundred  kilohertz,  and  its  amplitude  is  about  1 00  mV  with  some  negative  offset  voltage.  The 
low-speed  testing  is  used  to  confirm  the  circuit  functionality. 


Chapter  5:  Test  Results 


128 


Figure  5.2  The  equipment  setup  for  medium-speed  testing. 


5.1.3  Medium-Speed  and  High-Speed  Testing  Setup 

Fig.  5.2  shows  a  typical  medium-speed  testing  setup.  Data  patterns  with  frequency  up  to  one 
gigahertz  can  be  programmed  and  generated  by  the  TIP  8000  data  generator.  The  high-speed  atten¬ 
uator  and  bias  T  elements  can  be  used  to  further  adjust  the  input  signals  amplitude  and  offset.  The 
input  signal  requirement  is  the  same  as  in  the  low-speed  test.  The  high-speed  output  signals  are 
pre-amplified  from  100  pV  level  to  a  few  mV  level  and  then  observed  at  the  Tektronix  1 1 80 1 A 
sampling  oscilloscope  which  has  bandwidth  of  20  GHz.  The  noise  level  of  the  sampling  oscillo¬ 
scope  is  about  2  mV.  So  the  pre- amplification  of  the  output  signals  is  required.  Another  technique 
to  observe  the  small  signal  on  the  sampling  oscilloscope  is  by  averaging.  This  way  the  noise  from 
the  amplifier  is  averaged  out  while  the  signal  remains.  Signal-to-noise-ratio  (SNR)  is  improved  by 
the  square  root  of  the  number  of  averaging.  The  power  splitters  can  be  used  to  probe  input  signals 
and  observe  them  on  the  oscilloscope.  This  setup  can  be  used  to  test  circuits  from  tens  of  mega¬ 
hertz  up  to  one  gigahertz. 
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Figure  5.3  The  equipment  setup  for  high-speed  testing. 


Fig.  5.3  shows  a  high-speed  setup.  The  FIP  71612A  BERT  system  can  generate  up  to  12.5  GFlz 
NRZ  random  data  pattern  and  12.5  GFlz  clock  outputs.  The  high-speed  output  signals  are  ampli¬ 
fied  by  a  wide-band  Anritsu  amplifier  (gain  28  dB,  BW  0.03  -  10  GFlz)  to  a  few  mV  and  observed 
at  the  Tektronix  1 1 80 1 A  sampling  oscilloscope.  This  setup  can  verify  circuit  up  to  10  GFlz. 

5.2  Testing  Results 

5.2.1  MUX  Testing  Results 

5.2. 1.1  Low-Speed  Testing  Results  of  a  2:1  MUX 

9 

Shown  in  Fig.  5.4a  is  the  micrograph  of  a  2:1  MUX  fabricated  in  F1YPRES  1  kA/cnr  Nb  pro¬ 
cess.  The  size  of  circuit  is  approximately  700  pm  x  700  pm. 

Shown  in  Fig.  5.4b  are  the  measured  output  waveforms  at  250  kFlz.  The  input  patterns  are  not 
shown  here.  Input j  is  “0  0  0  0”  at  125  kFlz;  Input2  is  “1  0  1  0”  at  125  kFlz.  So  the  output  signals 
should  be,  Output  “0  1  0  0  0  1  0  0”  at  250  kFlz  and  Output  “101  1  101  1”  at  250  kFlz.  As 
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explained  in  Section  1.3.4,  in  each  clock  cycle,  a  transition  in  the  output  waveform  means  “1”;  no 
transition  means  “0”.  Voltage  levels  do  not  represent  “0”  and  “  1  ”.  Other  input  patterns  not  shown 
here  were  also  tested  with  success. 

The  measured  dc  bias  margins  are  (-7%,  7%). 

5.2. 1.2  Medium-Speed  and  High-Speed  Testing  Results  of  a  2:1  MUX 

Shown  in  Fig.  5.5  are  5  MHz  testing  results  for  the  MUX  using  setup  in  Fig.  5.2.  The  input 
signals  Clk,  Input, ,  Input2  are  normal  RZ  patterns,  observed  on  the  oscilloscope  before  entering 
the  test  chip.  Clk  is  at  5  MHz  rate.  Input,  is  a  “1  1  1  1  1”  pattern  at  2.5  MHz.  Input2  is  an  all-zeros 
pattern,  not  shown  in  the  figure.  So  the  output  is  a  “1010101010”  pattern.  Output  is  a  complemen¬ 
tary  “0101010101”  pattern.  Again,  transitions  in  the  output  waveforms  mean  “1”. 

Shown  in  Fig  5.6  are  testing  results  of  the  same  test  chip  at  3.5  GHz  using  setup  as  in  Fig.  5.3. 
We  observed  correct  functions  with  two  different  input  patterns.  Fig.  5.6a  has  the  same  input  pat- 

input] 

Clk  - 
Input2 


0  0  0  1  0  0” 

1110  11” 

(b) 

Figure  5.4  Testing  results  of  a  2:1  MUX  at  250  kHz.  (a)  Micrograph  of  a  2:1  RSFQMUX. 
(b)  Output  waveforms.  100  pV/div  on  y-axis,  5  ps/div  on  x-axis. 
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terns  as  in  Fig.  5.5  at  3.5  GHz  clock  rate.  Fig.  5.6b  has  Inputj  “1111  1”  at  1.75  GFlz  and  Input2 
“1111  1”  at  1.75  GFlz.  The  output  data  patterns  are  Output  “1111111111”  at  3.5  GFlz,  Output 
“0000000000”  at  3.5  GHz. 

The  DC  bias  margins  in  these  measurements  are  very  small,  probably  due  to  flux  trapping. 
These  measurements  were  performed  about  two  years  after  the  low-speed  testing  was  done.  Mate¬ 
rial  degradation  could  be  one  reason  causing  the  chips  to  be  prone  to  flux  trapping. 

5.2.2  DEMUX  Testing  Results 

5.2. 2.1  Low-Speed  Testing  Results  of  a  1:2  DEMUX 

Shown  in  Fig.  5.7  is  the  testing  waveform  of  the  1 :2  DEMUX  shown  in  Fig.  3.8.  It’s  a  20  GHz 
design  fabricated  in  the  HYPRES  1  kA/cirr  Nb  process. 


Clk 


Input! 


Output 


Output 


“1111111111”  @5  MHz 

“  1  1  1  1  1”  @  2.5  MHz 


“1010101010”  @  5  MHz 


“0101010101”  @  5  MHz 


Figure  5.5  Testing  results  of  a  2:1  MUX  at  5  MHz.  50  mV/div  on  y-axis  for  Clk  and  Input] . 

5  mV/div  on  y-axis  for  Output  and  Output.  200  ns/div  on  x-axis  for  all  signals. 
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Inputs 


Output 


Output 


“1  1  1  1  1  ”  @  1.75  GHz 


“1010101010”  @  3.5  GHz 


“0101010101”  @3.5  GHz 


(a) 


Inputt 


Input2 


Output 


Output 


“1  1  1  1  1  ”  @  1.75  GHz 

“  1  1  1  1  1”  @  1.75  GHz 

“1111111111”  @3.5  GHz 

“0000000000”  @  3.5  GHz 

(b) 


Figure  5.6  Testing  results  of  a  2:1  MUX  at  3.5  GHz  for  two  different  input  patterns, 
(a)  Input!  “11111  lnput2  “00000  “  (b)  Input!  “11111  lnput2  “  1  1 
111”.  50  mV/div  on  y-axis  for  Input!  ar|d  lnput2.  5  mV/div  on  y-axis  for 
Output  and  Output.  500ps/div  on  the  x-axis  for  all  signals. 


Input  waveforms  shown  here  are  the  outputs  of  SFQ/DC  converters  which  are  monitoring  the 
on-die  input  SFQ  signals,  so  each  transition  represents  a  “1”.  The  complementary  inputs  are  Input 
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“1  1  1  0  1  1  1  0”  @  1  kHz 
“0  0  0  1  0  0  0  1”  @  1  kHz 

“1  1  1  1”  @  500  Hz 
“0  0  0  0”  @  500  Hz 

“1  0  1  0”  @  500  Hz 
“0  1  0  1”  @  500  Hz 

Figure  5.7  Testing  results  of  a  1:2  DEMUX  at  1  kHz.  The  scales  of  the  above 
waveforms  are  1 00  pV/div  for  the  y-axis  and  1  ms/div  for  the  x-axis. 

“11101110”,  Input  “00010001”  at  1  kHz.  The  two  pairs  of  complementary  outputs  are  Output0 
“1111”,  Output0  “0000”  and  Outputj  “1010”,  Outputj  “0101”  at  500  Hz. 

The  experimental  dc  bias  margin  is  (-15%,  15%). 

5.2. 2. 2  Medium-Speed  Testing  Results  of  a  1:2  DEMUX 

Fig.  5.8  and  Fig.  5.9  are  the  testing  results  of  the  same  1 :2  DEMUX  test  chip  as  above  with  the 
same  input  data  patterns  as  above  at  10  MHz  and  1  GHz.  The  Input  and  Input  are  the  input  wave¬ 
forms  before  they  enter  the  test  chip.  Output0,  Output0,  Output]  arc  correct  results  except  Output] . 
The  dc  bias  margin  for  all  the  three  outputs  to  work  remains  (-15%,  +15%)  up  to  100  MHz.  And  it 
is  (-13%,  +13%)  at  one  gigahertz.  Outputs  were  not  terminated  on  this  test  chip,  so  the  refection 
distorted  the  Output]  waveform  at  1  GHz.  It  is  believed  that  cause  of  the  failure  at  Output]  is  flux 
trapping  in  spite  of  repeated  efforts.  This  was  an  old  chip.  Medium-speed  and  high-speed  testing 
were  performed  about  two  years  after  it  was  fabricated.  If  the  circuit  function  is  verified  at  1  kHz, 
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it  should  work  easily  at  one  megahertz,  which  is  a  very  low  speed  for  RSFQ  circuits,  but  it  did  not. 
Defluxing  in  the  usual  way  was  not  successful,  probably  a  result  of  degradation  of  the  niobium. 


Input 


Input 


Output0 


Output0 


Input 


Input 


Output! 


Output! 


Figure  5.8  Testing  results  of  a  1:2  DEMUX  at  10  MHz.  50  mV/div  on  y-axis  for  Input, 
Input.  2  mV/div  on  y-axis  for  Output0,  Output0,  Output! ,  Output!  ■  200  ns/div 
on  x-axis  for  all  signals. 


Input 


Input 


Output0 


Output0 


Figure  5.9  Testing  results  of  a  1:2  DEMUX  at  1  GHz.  50  mV/div  on  y-axis  for  Input, 
Input.  2  mV/div  on  y-axis  for  Output0,  Output0,  Output!,  Output!.  2  ns/div 
on  x-axis  for  all  signals. 


Input 


Input 


Outputi 


Output! 


Chapter  5:  Test  Results 


135 


5.2.2. 3  Medium-Speed  Testing  Results  of  a  1:4  DEMUX 

Shown  in  Fig.  5.10a  is  the  micrograph  of  a  1:4  DEMUX  fabricated  in  the  F1YPRES  1  kA/cm2 
Nb  process.  Fig.  5.10b  shows  a  testing  result  at  10  MFIz.  Input  is  “111111111111”  at  100  MFIz, 
Input  is  all  zeros,  not  shown  in  the  figure.  Correct  functioning  of  Output4  “1  1  1”  at  25  MFIz, 
Output4  all  zeros  were  observed. 
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Input 


Input 


Output  j 


Output] 


Output4 
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Output3 


(a) 


Input 


Input 

Monitor 


Output4 


Output4 


“111111111111”  @  100  MHz 


“ _ 1 1 _ 1”  @  25  MHz 

“ _ 0 _ 0 _ 0”  @  25  MHz 

(b) 


Figure  5.10  Testing  results  of  a  1:4  DEMUX  at  100  MHz.  (a)  micrograph  (b)  waveforms. 

50  mV/div  on  y-axis  for  Input.  2  mV/div  on  y-axis  for  Input  Monitor,  Output4, 
Output4.  20  ns/div  on  x-axis  for  all  signals. 
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Input 
@1  GHz 

Input 

Monitor 

Output4 


Output4 


Figure  5.11  Testing  results  of  a  1:4  DEMUX  at  1  GHz.  50  mV/div  on  y-axis  for 

Input.  2  mV/div  on  y-axis  for  Input  Monitor,  Output4,  Output4.  2  ns/div  on 
x-axis  for  all  signals. 


Fig.5. 1 1  shows  the  correct  testing  results  of  the  same  1 :4  DEMUX  with  the  same  input  pattern 
at  1  GHz.  Proper  termination  resistors  were  added  in  this  test  chip.  So  the  waveform  is  not  dis¬ 
torted  as  in  Fig.  5.9. 

No  dc  bias  margins  were  recorded  at  100  MHz  and  at  1  GHz.  However,  at  1  kHz,  the  dc  bias 
margins  (-6.5%,  +6.5%)  were  observed. 

5.2. 2. 4  High-Speed  Testing  Results  of  a  1:4  DEMUX 

Fig.  5.12  shows  the  direct  high-speed  testing  results  of  the  same  1:4  DEMUX  with  the  same 
input  pattern  at  9.2  GHz  as  in  Fig.  5.10  and  5.11.  The  outputs  are  at  2.3  GHz.  The  bandwidth  of  the 
amplifier  used  to  enlarge  the  output  signals  in  this  experiment  is  3  GHz.  So  the  observed  Output4 
waveform  became  a  more  sinewave  like  signal  instead  of  square  wave.  If  the  amplifier  bandwidth 
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Input 
@  9  GHz 


Output4 


Output4 


Figure  5.12  Testing  results  of  a  1 :4  DEMUX  at  9.2  GHz.  20  mV/div  on  y-axis 
for  Input.  2  mV/div  on  y-axis  for  Output4,  Output4.  200  ps/div  on 
x-axis  for  all  signals. 

is  improved,  higher-speed  operation  can  be  observed  since  no  dc  bias  margin  degradation  is 
observed  when  the  frequency  was  increased  from  1  GHz  to  9.2  GHz  although  the  margin  is  small. 
Flux  trapping  is  again  the  main  difficulty  in  measurement. 

5.3  Unmeasured  Test  Chips 

Three  sets  of  masks  were  made  for  circuits  to  be  fabricated  in  the  1  kA/cnT-  UCB  Nb  process. 
And  one  set  was  made  for  the  6.5  kA/crn-  UCB  Nb  process.  Lack  of  funding  prevented  completion 
of  the  processing  of  these  chips  in  our  Microfabrication  Laboratory.  A  future  prosecution  of  this 
project  could  use  the  designs  presented  here.  The  masks  for  the  critical  layers  including  junction 
definition  layer  AN,  metal  layers  Ml  and  M2  are  made  by  high-resolution  e-beam  writing  at 
Dupont.  So  the  junction  areas  and  the  inductances  in  the  circuits  have  good  mask  control.  We 
made  masks  of  all  other  layers  in  the  Berkeley  Microfabrication  Laboratory. 
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Figure  5.13  Mask  set  No.  1  for  UCB  1  kA/cm2  Nb  process. 


9 

Shown  in  Fig.  5.13  is  the  mask  set  No.  1  for  the  UCB  1  kA/cm-  Nb  process.  Each  mask  set  can 
host  four  5000  pm  x5000  pm  chips.  On  the  upper-right  chip,  we  placed  two  circuits  laid  out  for  the 
HYPRES  1  kA/cm-  Nb  process  that  were  previously  verified.  One  circuit  is  the  high-speed  test 
system  [55].  The  other  circuit  is  the  2-bit  MUX,  as  in  Fig.  5.4(a).  They  are  good  candidates  to 
compare  UCB  1  kA/cm"  process  with  HYPRES  1  kA/cm"  process.  Other  diagnostic  structures 
such  as  50-Josephson  junction  (JJ)  series  array,  resistor  array  and  M1/M2  cross-over  are  put  on 
chips  for  the  process  verification.  These  structures  are  placed  on  every  chip  whenever  the  space 
and  the  pin  assignments  allow.  The  other  three  chips  belong  to  other  projects.  These  chips  are 
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4: 1  MUX  with  the  old  Dff  1 :2  DEMUX  with 
1:8  DEMUX  1 :4  DEMUX  with  high-speed  test  system  high-speed  test  system 


MUX  with 
the  old  Dff 


4:1  MUX  with 
the  old  Dff 


Old  Dff 


4:1  MUX  with 
the  new  Dff 


New  Dff 
RSff 


Figure  5.14  Mask  set  No.  2  for  UCB  1  kA/cm2  Nb  process. 


made  to  be  tested  in  the  24-pad  high-speed  probe.  High-speed  probe  is  preferred  due  to  better 
shielding  and  higher  testing  speed  it  supports. 

Shown  in  Fig.  5.14  is  mask  set  No.  2  for  UCB  1  kA/cnr  Nb  process.  These  four  chips  are  all 
made  for  the  40-pad  low-speed  probe.  We  chose  the  low-speed  probe  layout  for  the  larger  number 
of  available  pads  so  that  we  are  able  to  include  more  basic  blocks  for  verification. 

The  RSff  and  Dffs  used  in  the  MUX  are  included  in  the  test  chip  for  verification.  Layout  of  Dff 
was  previous  verified  in  HYPRES  process,  but  the  simulation  and  testing  dc  bias  margin  is  not 
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good.  So  a  new  improved  version  is  made.  4:1  MUXs  with  both  the  old  Dff  and  the  new  one  are 
included  in  the  test  chip.  Furthermore,  a  8:1  MUX  with  the  old  Dff  and  a  4:1  MUX  with  the  old 
Dff  and  with  high-speed  test  system  are  included  on  the  test  chip.  The  Dff  used  in  the  DDST  shift 
register  is  also  the  old  verified  version. 

A  1:4  DEMUX,  a  1:8  DEMUX  and  a  1:2  DEMUX  with  the  high-speed  test  system  are 
included  in  the  test  chip. 

With  this  test  chip  set,  we  are  able  to  perform  low-speed  function  verification  from  the  basic 
blocks  to  the  more  complicated  8:1  MUX  and  1:8  DEMUX  circuits.  We  are  also  able  to  perform 
on-chip  high-speed  testing  of  a  4: 1  MUX  and  a  1 :2  DEMUX. 

Shown  in  Fig.  5.15  is  the  mask  set  No.  3  forUCB  1  kA/cnr  Nb  process.  The  new  improved  4- 
bit  and  8-bit  MUX  and  DEMUX  with  high-speed  test  systems  are  included.  These  circuits  are  dif¬ 
ficult  to  fabricate  in  the  Microlab  environment  due  to  the  circuit  complexity.  But  if  fabricated  suc¬ 
cessfully,  the  high-speed  verification  of  8: 1  MUX  and  1:8  DEMUX  can  be  performed. 

'y 

Compared  to  the  F1YPRES  1  kA/cnr  Nb  process  layout,  we  added  layer  AN  for  both  junction 
CE  definition  and  anodization  ring  definition.  The  24-pad  and  40-pad  frame  layouts  are  modified 
to  avoid  non-orthogonal  geometries  to  for  the  masks  made  in  the  microlab. 

'y 

Fig.  5. 16  shows  the  first  mask  set  made  for  the  UCB  6.5  kA/cnr  Nb  process.  Even  though  we 

'y 

did  not  get  successful  experimental  results  from  the  1  kA/cm  UCB  process,  we  proceeded  to  work 
on  6.5  kA/cm-  designs  based  on  some  promising  high  Jc  junction  and  circuit  results  from  our 
group.  We  put  the  key,  yet  simple,  blocks  on  the  first  run.  If  these  blocks  are  verified  successfully, 
we  can  build  more  complicated  MUX  and  DEMUX  circuits  from  these  blocks  in  the  next  test  chip. 
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1:8  DEMUX  with  8: 1  MUX  with  the  new  Dff 

high-speed  test  system  with  high-speed  test  system 


4:1  MUX  with  the  new  Dff  1 :4  DEMUX  with 
with  high-speed  test  system  high-speed  test  system 

Figure  5.15  Mask  set  No.  3  for  UCB  1  kA/cm2  Nb  process. 

In  our  plan,  the  first  circuit  to  be  tested  is  the  Tff  without  DC/SFQ  and  SFQ/DC  converters.  It 
has  only  11  junctions.  It  can  be  verified  by  dc  voltage  measurement.  Shown  in  Fig.  5.17  is  a  micro- 
graph  of  the  fabricated  6.5  kA/cm  Tff.  When  Vbias  Input  is  increased  such  that  the  bias  current  for 
the  input  junction  is  larger  than  its  critical  current,  SFQ  pulses  are  generated  across  the  input  junc¬ 
tion  and  propagated  through  the  JTLs  to  the  input  of  the  Tff.  The  frequency  of  the  output  SFQ 
pulses  are  half  of  that  of  the  input.  The  DC  voltage  measured  at  the  input  junction  Vinput  =  fin  ®0. 
The  dc  voltages  measured  at  the  output  junctions  are  Voutputi  =  f0ut  ar*d  VouipUt2  =  f0ut  ®o- 
Since  fjn  —  2fout,  VQutputj  —  VQutput2  —  2Vjnput. 
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Figure  5.16  Mask  set  No.  1  for  UCB  6.5  kA/cm2  Nb  process. 


Input 


V 


biaslnput 


Vbias_Tff 

Output  j 


Output2 


Figure  5.17  A  6.5  kA/cm2  Tff  micrograph. 
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Figure  5.18  A  6.5  kA/cm2  1:2  DEMUX  micrograph. 


Similarly,  a  1 :2  DEMUX  is  also  planned  to  be  verified  through  the  input/output  dc  voltage 
comparison.  Fig.  5.18  shows  a  micrograph  of  the  1:2  DEMUX.  In  this  layout,  it  has  total  48 
Josephson  junctions.  When  Input  is  over-biased,  we  check  Voutputl  =  ^Output2  =  2Vjnput  When 
Input  is  over-biased,  we  check  Voutputl  =  ^Output 2  =  2Vp^;  This  is  not  a  complete  test  with  ran¬ 
dom  input  patterns,  but  good  enough  to  get  the  DEMUX  verified  at  one  simple  pattern  up  to  very 
high-speed  without  involving  complicated  test  circuits  which  reduce  the  chance  of  success  in  the 
new  technology. 

We  chose  to  verify  the  DC/SFQ  converter  and  the  SFQ/DC  converter  since  they  are  the  neces¬ 
sary  interface  circuits  for  any  RSFQ  circuits  to  be  tested  with  external  pattern  generator  data.  They 
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Figure  5.19  Micrograph  of  two  versions  of  6.5  kA/cm2  Dffs. 

are  wide-margin  circuits.  But  the  smallest  junction  (Ic  =120  pA)  in  our  junction  library  is  used  in 
these  two  circuits,  which  made  them  fabrication  challenging. 

We  also  put  two  versions  of  Dffs  on  the  first  run  since  Dff  is  a  critical  blocks  used  in  our  test 
system  design  and  MUX  design.  One  is  the  a  ported  version  from  a  previous  verified  Dff  in  1 
kA/cm  process  by  only  modifying  junction  areas  in  the  layout.  The  other  one  is  our  optimization 
result  and  is  used  in  the  6.5  kA/cm-  DDST  SR  layout. 

The  cgs  and  the  high-speed  test  system  are  also  put  on  the  first  run.  If  they  are  verified  suc¬ 
cessfully,  they  can  be  applied  for  on-chip  high-speed  testing  of  the  MUX  and  the  DEMUX. 

In  the  6.5  kA/cm  chips,  moats  are  more  systematically  added.  The  principle  is  that  the  mag¬ 
netic  flux  inside  a  complete  moat  enclosure  should  be  less  than  one  magnetic  flux  quantum.  For  a 
square  moat  enclosure,  that  is,  the  area  A  <  O0/B;  the  length  of  one  side  L  <  sqrt  (O0/B).  For  1  mG 
magnetic  field,  the  moat  size  should  be  smaller  than  144  pm  x  144  pm.  In  our  design,  we  chose 
size  for  3  mG  residual  magnetic  field.  The  moat  sizes  are  smaller  than  83  pm  x  83  pm. 
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5.4  Conclusion 

Some  successful  testing  results  [64]  are  achieved  in  both  low-speed  testing  and  direct  high¬ 
speed  testing  for  the  early  stage  designs  where  post  layout  optimization  was  not  implemented.  The 
achieved  dc  bias  margins  are  smaller  than  simulated.  Flux  trapping  is  a  major  obstacle  in  measure¬ 
ment  in  spite  of  all  the  effort  made  improving  degaussing  procedure. 

The  newer  designs  have  improvements  in  the  following  ways.  1 .  The  circuits  are  optimized 
with  extracted  parasitic  inductances.  2.  More  systematic  moats  are  added  in  the  layout  surrounding 
the  junction-inductor  loops  in  the  entire  circuit  area  to  combat  the  flux  trapping.  3.  All  the  input 
signals  have  impedance  matching  resistors  and  all  the  output  signals  have  termination  resistors 
added  in  the  layout.  So  we  expect  better  testing  results  when  they  are  fabricated  successfully. 
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APPENDIX 

High-Tc  Superconductor  RSFQ 
Circuits;  Monte-Carlo  Analysis 


A.1  introduction 

The  main  motivation  of  making  high-Tc  superconductor  (HTS)  digital  circuits  is  the  relative 
ease  of  refrigeration  compared  to  the  one  used  for  low-Tc  superconductor  (LTS)  circuits.  But  due 
to  the  fabrication  and  design  difficulty,  only  small  HTS  digital  circuits  composed  of  10-20  Joseph- 
son  junctions  have  been  demonstrated.  To  investigate  how  large,  how  fast  and  at  how  high  temper¬ 
ature  the  circuit  can  operate,  a  joint  study  was  performed  involving  collaborations  between  UC 
Berkeley  and  three  companies:  TRW,  Conductus,  and  Northrop  Grumman.  (TRW  later  became  a 
part  of  Northrop  Grumman.)  Process  and  device  information  were  supplied  by  the  three  compa¬ 
nies.  Some  representative  circuit  designs  under  development  were  also  provided  by  the  three  com¬ 
panies.  UC  Berkeley  was  responsible  for  carrying  out  the  theoretical  calculations  to  predict  yield 
and  bit-error-rate  (BER)  including  thermal  noise.  An  operating  temperature  of  40  K  was  chosen 
because  of  interest  in  refrigerators  at  that  temperature. 

Large  process  variations  and  thermal  noise  related  to  higher  operating  temperature  are  the  two 
main  factors  impeding  implementation  of  larger  HTS  digital  circuits.  In  this  section,  we  will  elab¬ 
orate  these  two  challenges  and  other  trade-offs  in  HTS  RSFQ  circuit  design.  Methodologies  used 


Appendix:  High-Tc  Superconductor  RSFQ  Circuits;  Monte-Carlo  Analysis 


147 


to  analyze  these  issues  will  be  presented,  with  the  focus  on  Monte  Carlo  calculations.  In  Section 
A.2,  details  of  Monte  Carlo  calculations  for  two  versions  of  HTS  T  flip-flops  are  presented  and  the 
effect  of  parasitic  inductance  is  demonstrated.  In  Section  A.3,  the  theoretical  yield  of  a  counter  cir¬ 
cuit  consisting  of  three  stages  of  T  flip-flops  is  calculated.  In  Section  A.4,  a  conclusion  will  be 
drawn  and  direction  will  be  given  based  on  the  above  calculation  results. 

In  the  well  developed  LTS  tunnel  junction  technology,  we  have  to  shunt  the  Josephson  junc¬ 
tion  with  an  external  resistor  to  achieve  the  proper  nonhysteretic  I-V  characteristics  used  by  RSFQ 
circuits.  HTS  junctions  made  from  the  YE^C^Oy^  material  have  an  intrinsic  nonhysteretic  I-V 
characteristic,  which  makes  the  RSFQ  logic  family  a  natural  choice  for  HTS  digital  circuits. 

HTS  circuit  design  is  challenging  due  to  the  undesirable  material  and  process  limitations.  Due 
to  the  larger  penetration  depth  in  HTS  materials,  the  minimum  realizable  inductance  per  square  is 
about  1  pH.  In  layout,  it  is  hard  to  make  a  loop  with  less  than  4  squares  (Lmin  ~  4  pH).  In  an  RSFQ 
circuit,  the  typical  loop  ICL  =  ®()/2.  So  that  Lmin  of  4  pH  determines  Icmax  ~  250  pA.  However  in 
HTS,  larger  Ic  is  desired  to  combat  the  more  significant  thermal  noise.  So  Lmin  imposes  an  unde¬ 
sirable  design  constraint.  And  even  more,  the  parasitic  inductance  between  the  junctions  and  the 
ground  plane  is  about  1  ~  3  pH,  which  is  harmful  to  circuit  margin.  The  series  linear  inductance 
weakens  the  effectiveness  of  the  nonlinearity  of  the  switching  junction.  Larger  IcRn  is  desired  so 
the  circuit  can  run  faster.  With  Ic  limited,  we  would  like  to  increase  Rn.  But  for  HTS  junctions,  Ic 
and  Rn  are  correlated.  When  the  process  is  adjusted  to  achieve  higher  Rn,  Ic  may  be  reduced,  so 
IcRn  is  limited  by  the  process. 

With  the  circuit  design  requirements  in  mind,  we  have  studied  the  collected  state-of-the-art 
HTS  junction  information  [65] [66] [67]  and  written  a  junction  model  required  for  the  WRspice 


simulation  program. 
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.model  ybcojj(rtype=l,  cct=l,  icon=10m,  vg=2.8m  delv=0.08m 
+  icrit=0.5m,  r0=l,rn=l,cap=0.0025p) 

In  this  model,  IcRn=  500  pV,  Pc  =  (2n/O0)*(IcRn)*(CRn)  =  3.8xl0'3.  This  is  based  on  the  mea¬ 
surement  of  Ic  and  Rn.  But  the  determination  of  the  junction  capacitance  is  more  ambiguous.  For¬ 
tunately,  with  Pc  «  1  in  HTS  junctions,  the  accuracy  of  the  capacitance  value  is  not  important.  In 
other  words,  a  change  of  one  or  two  orders  of  capacitance  value  in  the  model  will  not  much  affect 
the  circuit  performance.  This  is  verified  by  JTL  pulse  width  simulation  by  increasing  the  capaci- 
tance  value  100  times.  The  IcRn  value  of  500  pV  is  close  to  the  one  of  592  pV  in  LTS  6.5  kA/cm 
Nb  process.  This  enables  a  circuit  such  as  a  T  flip-flop  to  run  at  above  100  GHz.  As  a  matter  of 
fact,  Jc,  Ic  and  IcRn  are  functions  of  temperature.  Jc,  Ic  and  IcRn  decrease  with  increasing  tempera¬ 
ture.  For  junctions  operated  at  a  temperature  different  from  40  K,  the  above  junction  model  should 
be  modified. 

Severe  process  variations  prevent  implementation  of  large  HTS  circuits.  At  the  time  of  this 
study,  the  standard  deviation  of  the  HTS  junction  critical  current  was  about  1 0%,  which  is  several 
times  larger  than  that  in  LTS.  The  process  variation  of  inductance  is  also  larger  in  HTS.  The  circuit 
yield  is  foreseeably  low.  But  how  low  is  it?  And  how  does  the  yield  decrease  with  the  increasing 
circuit  size?  Monte  Carlo  analysis  is  done  here  to  explore  these  issues  and  provide  a  theoretical 
answer.  The  process  variations  can  be  divided  into  two  categories:  global  variations  and  local  vari¬ 
ations.  The  global  variations  reflect  the  parameter  spreads  from  lot  to  lot,  from  wafer  to  wafer  and 
from  chip  to  chip.  The  local  variations  are  the  parameter  spreads  on  the  same  chip.  In  our  Monte 
Carlo  analysis,  circuit  yield  is  defined  as  the  success  rate  among  the  total  runs  (usually  >100  runs). 
In  each  run,  the  circuit  parameters  are  pseudo-randomly  generated  by  the  simulator  based  on  the 
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global  and  local  variations.  The  circuit  parameters  are  assumed  to  have  a  gaussian  distribution 
with  the  mean  values  as  designed. 

The  process  variations  used  in  our  calculation  are  listed  in  the  table  below. 

TABLE  A-1  HTS  global  process  variations  (1o  value) 


Jc 

Ic^n 

L 

R 

0% 

0% 

15% 

12% 

The  global  variations  of  Jc  and  IcRn  are  not  investigated  here.  It  was  agreed  to  screen  the  sam¬ 
ples  under  study  to  have  the  target  Jc  and  IcRn  values. 

TABLE  A-2  HTS  local  process  variations  (Icr  value) 


>c 

Ic^n 

L 

R 

ideal  spreads 

5% 

2.5% 

5% 

4% 

state-of-the- 
art  spreads 

10% 

5% 

15% 

4% 

medium 

spreads 

15% 

10% 

10% 

4% 

large  spreads 

25% 

15% 

20% 

4% 

For  local  process  variations,  the  state-of-the-art  process  variations  are  collected  from  the  three 
major  companies.  And  a  set  of  ideal  process  variations  equivalent  to  the  state-of-the-art  in  LTS  are 
set  to  see  how  much  the  circuit  yield  can  be  improved  with  better  process  control.  Simulation  with 
the  set  of  more  realistic  and  the  set  of  sloppy  process  variations  reveals  how  the  yield  deteriorates 
when  the  process  control  is  worse  than  the  state-of-the-art. 

By  the  statistical  nature  of  the  Monte  Carlo  analysis,  the  yield  is  not  a  certain  value.  It  has  a 
Gaussian  distribution.  The  calculated  yield  Y  is  the  mean  value.  And  the  variance  of  yield  a  = 
Y(l-Y)/N,  where  N  is  the  total  number  of  runs,  equal  to  100  in  our  calculations.  For  a  95%  confi- 
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dence  level,  the  confidence  interval  L  =  2 a  =  2  •  J(Y(  1  -  Y))/N  .  The  predicted  yields  lie  in  the 
range  of  Y+L  with  a  95%  probability. 

Another  issue  in  HTS  circuit  design  is  thermal  noise  related  to  the  higher  operation  tempera¬ 
ture  (40-70  K  vs.  4.2  K  in  LTS).  The  thermal  noise  can  be  modeled  by  a  random  current  source  in 
parallel  with  each  resistor  or  junction  in  the  circuit.  The  rms  value  of  the  current  fluctuations  is 
given  by  the  Nyquist  formula 

,  =  Wfc 

rms  A I  R 

where  k  is  Boltzman’s  constant,  T  is  temperature,  R  is  resistance  or  Rn  of  the  junction,  and  fc 
is  the  cutoff  frequency  of  the  noise  frequency  band.  In  WRspice,  a  random  Gaussian  noise  is  gen¬ 
erated  in  time  domain,  defined  by 

@  define  noise(R,T,A,n)  guass(sqrt(4*boltz*T/(R*2*A)),0,A,n) 

where  A  =  l/(2fc),  is  the  time  spacing  between  two  random  numbers,  n  is  an  integer  which 
defines  the  interpolation  type,  either  first-order  interpolated  or  piecewise  linear  steps.  The  simula¬ 
tion  time  step  should  be  much  smaller  than  A  to  ensure  interpolation  algorithm  stability.  And  A 
should  be  small  compared  to  the  time  constant  of  the  circuit. 

A  simulation  including  the  above  defined  thermal  noise  with  and  without  process  variation 
were  used  to  predict  BER  [69][70][71].  And  a  combination  of  Monte  Carlo  analysis  and  thermal 
noise  in  transient  simulation  can  predict  both  the  yield  and  the  BER  more  accurately.  The  Monte 
Carlo  analysis  reported  in  the  following  sections  only  considers  process  variations  in  order  to  keep 
the  computation  time  within  reasonable  bounds. 
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Figure  A.l  TRW  T  flip-flop  schematic. 


A.2  Monte-Carlo  Calculation  on  T  Flip-Flops 

A.2.1  TRW  T  Flip-Flop 

The  first  circuit  we  studied  is  a  toggle,  or  “T”,  flip-flop  shown  in  Fig.  A.l.  A.G.  Sun  in  TRW 
provided  us  the  original  design  which  was  optimized  in  MALT  with  the  extracted  parasitic  induc¬ 
tance.  (They  later  on  reported  this  T  flip-flop  with  some  parameter  changes  working  at  65K  [68].) 

The  Sun  design  has  a  total  of  14  junctions  and  includes  parasitic  inductances.  We  can  see  that 
the  parasitic  inductance  is  in  the  order  of  1  ~  3  pFI.  On  the  left,  B0,  L0,  Bj,  Lj,  L14  form  a  dc-to-sfq 
converter.  On  the  right,  B6,  B7,  B§,  B9,  B10  and  the  related  inductors  and  bias  current  sources  form 
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the  Tff  core.  In  between  are  some  connection  JTLs.  Junctions  Bn,  B12,  and  inductors  Lu,  L12,  L13 
form  a  monitor  to  detect  the  state  of  the  Tff.  A  voltage-controlled  voltage  source  E0  and  the  RC 
network  are  added  here  purely  for  our  simulations.  It  is  used  to  test  the  average  voltage  at  the  node 
that  E0  is  monitoring.  A  triangle  waveform  fed  through  I0  is  converted  to  SFQ  pulse  trains  across 
Bj.  The  pulses  travel  down  the  JTLs,  and  switch  B8  and  B7  in  turn.  The  voltage  at  the  output  of  E0 
will  switch  between  two  values. 

We  took  the  circuit  parameters  and  did  simulation  with  the  original  Sun  junction  model 
ybcotrw  and  the  new  model  ybco  to  confirm  its  operation  defined  below.  Fig.  A.2  shows  example 


Figure  A.2  Simulation  waveform  of  TRW  Tff  at  50  GHz.  (a)  Voltage  waveforms,  (b) 

Phase  waveforms. 

simulation  waveforms  at  50  GHz  using  the  new  model  ybco.  Fig.  A.2a  shows  the  node  voltages  at 
B5,  B8,  B7  and  after  the  output  monitoring  RC  filter.  The  first  three  nodes  represent  the  input  and 
the  two  outputs  of  the  T  flip-flop.  The  input  pulses  are  diverted  to  the  two  outputs  alternately.  The 
filter  output  switched  between  0  and  an  average  voltage  of  about  0.25  mV  corresponding  to  each 
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output  switching  at  B8  and  B7.  Fig.  A.2b  shows  the  phase  waveforms  of  B5,  B8,  B7.  These  phase 
values  and  the  filter  output  voltages  are  monitored  in  simulation  to  judge  circuit  pass/failure. 

For  reference,  the  Sun  junction  model  is  listed  below. 

.model ybcotrw jj(rtype=l ,  cct=l,  icon=10m,  vg=2.8m  delv=0.08m 

+  icrit=0.16m,  r0=0.469,rn=0.469,cap=0.05p) 

It  has  an  IcRn  value  of  75  pV.  |3C  =  5. 3x1  O'3.  The  new  model ybco  has  an  improved  IcRn  value 
of  500  pV.  It  reflects  the  progress  on  HTS  junction  process.  So  the  circuit  can  be  operated  at  a 
higher  speed.  But  we  did  not  re-optimize  the  circuit  for  the  new  junction  model  because  we  rea¬ 
soned  that  the  IcRn  value  should  not  change  circuit  optimization  results  at  low  speed  where  the 
pulse  interference  doesn’t  impact  circuit  operation. 

Table  A-3  lists  the  calculated  yield  based  on  the  Sun  model.  Some  other  results  were  previ¬ 
ously  reported  by  R  Xie  [69].  The  improvement  is  that  the  circuit  pass/failure  criteria  is  examined 
and  modified,  so  the  yield  values  are  better  in  this  report. 

TABLE  A-3  TRW  HTS  Tff  theoretical  yield  with  lcRn  =75  pV 


Process  variation 

Yield  (95%  confidence  level) 

5  GHz 

10  GHz 

State-of-art  spreads 

52.9%  (±9.  1  %) 

50.4%(±9.  1  %) 

Ideal  spreads 

94.2%  ±4.  3  %) 

84.3%  (±6.  6%) 

With  IcRn  =  75  pV,  the  yield  of  the  Tff  is  not  very  good  for  the  state-of-art  spreads.  The  yield 
at  5  GHz  is  about  52.9%  (±9.  1  %).  Better  process  control  with  the  ideal  spreads  can  improve  the 
yield  at  5  GHz  to  94.2%  (±4.  3  %).  The  severe  reduction  of  yield  from  the  ideal  spreads  to  the 
state-of-the-art  spreads  for  IcRn  =  75  pV  implies  that  the  parameter  margins  of  the  optimized  cir- 
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cuit  are  still  not  large  enough  to  fight  the  process  variations.  Improving  IcRn  is  necessary  to 
improve  the  circuit  yield  at  5  GHz  and  higher  speeds. 


The  yield  calculation  based  on  the  new  model  with  the  improved  IcRn  (500  pV)  are  summa¬ 
rized  in  Table  A-4. 


TABLE  A-4  TRW  HTS  Tff  theoretical  yield  with  lcRn  =500  pV 


Process  variation 

Yield  (95%  confidence  level) 

5  GHz 

10  GHz 

20  GHz 

50  GHz 

State-of-art  spreads 

80.2% 

(±7.3%) 

79.3% 

(±7.  4%) 

77.7% 

(±7.  6%) 

71.1% 

(±8.2%) 

Ideal  spreads 

93.4% 

(±4.5%) 

96.7% 

(±3.  3%) 

96.7% 

(±3.  3%) 

95.0% 

(±4.0%) 

With  the  ideal  spreads,  the  yield  with  the  new  IcRn  value  remains  good  (>  90%)  up  to  50  GHz 
while  with  the  old  IcRn  value,  the  yield  can  drop  below  80%  at  10  GHz.  At  5  GHz,  the  new  yield  is 
similar  with  the  one  with  lower  IcRn.  This  proves  our  previous  point  that  increasing  IcRn  value 
from  75  pV  to  500  pV  doesn’t  require  circuit  re-optimization  at  low  speed  where  the  pulse  inter¬ 
ference  effect  is  negligible. 


With  the  state-of-the-art  spreads,  the  improved  IcRn  value  improves  the  circuit  yield  a  great 
amount.  At  5  GHz,  the  yield  increases  from  52.9%  (±9.  1  %)  to  80.2%  (±7.  3  %).  At  50  GHz,  it  still 
has  a  yield  of  71.1%  (±8.  2  %).  Fig.  A.3  illustrates  the  data  in  Table  A-4. 


A.2.2  Conductus  T  Flip-Flop 

We  also  studied  another  T  flip-flop  shown  in  Fig.  A.4  from  V.  K.  Kaplunenko  in  Conductus.  It 
does  not  contain  any  parasitic  inductance  associated  with  the  junctions.  The  junction  model  used 


for  this  circuit  is 
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Figure  A.3  TRW  Tff  theoretical  yield  with  lcRn  =  500  pV. 
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Figure  A.4  Conductus  T  flip-flop. 
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.model ybcocond jj(rtype=l,  cct=l,  icon=10m,  vg=2.8m  delv=0.08m 
+  icrit=0.25m,  r0=2,rn=2,cap=0.26p) 

It  has  an  IcRn  value  of  500  pV  and  |3C  =  0.79.  The  calculated  yields  for  this  idealized  T  flip- 
flop  were  published  in  [69]  and  copied  here  to  be  compared  with  the  results  of  the  TRW  T  flip- 
flop.  Fig.  A.5  illustrates  the  data  in  Table  A-5. 


— ■-  .  ideal  spreads 

_ * _  state-of-the-art 

spreads 

-  medium  spreads 
— x-  •  large  spreads 


Speed  (GHz) 

Figure  A.5  Conductus  idealized  Tff  theoretical  yield  with  lcRn  =  500  pV. 


TABLE  A-5  Conductus  HTS  Tff  theoretical  yield  with  lcRn  =500  pV 


Process 

variation 

Yield  (95%  confidence  level) 

2  GHz 

30  GHz 

50  GHz 

71.4  GHz 

83.3  GHz 

State-of-art 

spreads 

81.8% 

(±7.0%) 

83.5% 

(±6.  8%) 

83.5% 

(±6.  8  %) 

79.3% 

(±7.  4%) 

54.5% 

(±9.  1  %) 

Ideal  spreads 

96.7% 

(±3.  3%) 

95.9% 

(±3.  6%) 

97.5% 

(±2.  8  %) 

94.2% 

(±4.  2%) 

69.4% 

(±8.  4%) 

Medium  spreads 

66.1% 

(±8.  6%) 

63.6% 

(±8.  7%) 

62.8% 

(±8.  8%) 

67.8% 

(±8.  5%) 

36.4% 

(±8.  7  %) 

Large  spreads 

40.5% 

(±8.  9%) 

43.8% 

(±9.  0%) 

32.2% 

(±8.  5%) 

27.3% 

(±8.  1%) 

20.7% 

(±7.4%) 
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Figure  A.6  TRW  3b-counter. 


With  the  state-of-the-art  spreads,  at  a  few  gigahertz,  the  Conductus  T  flip-flop  yield  is  slightly 
larger  than  for  the  TRW  T  flip-flop.  Both  are  around  80%.  But  the  Conductus  T  flip-flop  yield 
remains  this  value  up  to  about  70  GHz.  The  yield  of  the  TRW  tff  drops  to  about  70%  at  50  GHz. 
With  the  ideal  spreads,  both  T  flip-flops  have  similar  good  yield  up  to  50  GHz. 

Eliminating  the  junction  parasitic  inductance  as  much  as  possible  is  another  way  to  improve 
circuit  parameter  margins  and  yield.  This  requires  developing  a  new  junction-formation  process. 
With  the  state-of-the-art  process  at  the  time  of  the  study,  the  parasitic  inductance  was  as  high  as  1  - 
3  pH. 

A.3  3-Stage  Counter 

We  further  investigated  the  yield  of  a  counter  consisting  of  a  three-stage  cascaded  TRW  T  flip- 
flops  which  we  studied  in  Section  A.2.1.  The  counter  circuit  schematic  is  shown  in  Fig.  A.6.  It 
contains  38  junctions.  In  Monte  Carlo  analysis,  the  output  junction  phases  and  the  voltage  after  the 
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RC  filter  of  all  three  stages  were  monitored  to  judge  the  success  of  the  circuit  operation.  The  calcu¬ 
lated  yield  results  are  listed  in  Table  A-6.  Fig.  A.7  illustrates  the  data  in  Table  A-6. 


TABLE  A-6  TRW  HTS  3-stage  counter  theoretical  yield  with  lcRn  =500  gV 


Process  variation 

Yield  (95%  confidence  level) 

10  GHz 

20  GHz 

50  GHz 

State-of-art  spreads 

45.5% 

(±9.  1%) 

42.1% 

(±9.0%) 

33.9% 

(±8.6%) 

Ideal  spreads 

76.9% 

(±7.  7%) 

71.9% 

(±8.2%) 

64.5% 

(±8.  7%) 

The  3b-counter  yield  values  are  much  smaller  than  the  ones  of  the  one  stage  T  flip-fop.  With 
the  state-of-the-art  spreads,  at  10  GHz,  the  yield  drops  from  79.3%  (±7.  4%)  to  45.5%  (±9.  1  %). 
At  50  GHz,  it  drops  from  71.1%  (±8.  2  %)  to  33.9%  (±8.  6  %).  And  even  with  the  ideal  spreads,  at 
10  GHz,  the  yield  drops  from  96.7%  (±3.  3  %)  to  76.9%  (±7.  7  %).  At  50  GHz,  it  drops  from  95.0% 
(±4.  0  %)  to  64.5%  (±8.  7  %). 


0  10  20  30  40  50 


Speed  (GHz) 


Figure  A.7  TRW  3b-counter  theoretical  yield  with  lcRn  =  500  gV. 
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A.4  Conclusion  and  Future  Work 

A  few  conclusions  can  be  drawn  based  on  the  simulations  in  this  chapter. 

1.  Without  considering  the  thermal  noise,  with  the  state-of-the-art  process  variations  and  IcRn 
=  500  pV,  the  yield  of  a  basic  cell  T  flip-flop  (14  junctions)  is  77.7%  (±  7.  6  %)  at  20  GHz;  71.1% 
(±8.  2%)  at  50  GHz.  The  yield  of  a  medium  circuit  3b-counter  (38  junctions)  is  42.1%  (±  9.  0%)  at 
20  GHz;  33.9%  (±  8.  6%)  at  50  GHz.  These  yield  values  are  too  low  to  make  any  useful  large  HTS 
circuit  as  we  can  see  the  yield  degrades  rapidly  as  the  number  of  devices  in  the  circuit  increases. 
Improvement  on  several  aspects  can  help  increase  the  yield  value. 

2.  The  most  important  factor  affecting  the  yield  is  the  process  variation.  The  LTS  state-of-the- 
art  equivalent  process  spreads  can  improve  the  yield  of  the  HTS  T  flip-flop  to  96.7%  (±  3.  3%)  at 
20  GHz;  95.0%  (±4.  0%)  at  50  GHz.  The  yield  of  the  3b-counter  is  improved  to  71.9%  (±8.  2%) 
at  20  GHz;  64.5%  (±8.  7%)  at  50  GHz. 

3.  Increasing  junction  IcRn  value  can  increase  circuit  maximum  operation  speed  and  increase 
circuit  yield  at  high  speed.  With  the  state-of-the-art  process  variation,  the  yield  of  the  TRW  flip- 
flop  is  50.4%  (±9.  1%)  at  10  GHz  for  IcRn  =  75  pV  compared  to  71.1%  (±8.  2%)  at  50  GHz  for 
IcRn  =  500  pV. 

4.  Reducing  parasitic  inductances  is  favorable.  The  idealized  Conductus  T  flip-flop  without 
parasitic  inductance  has  a  better  yield  at  50  GHz  and  above  with  the  same  process  variations. 

BER  calculations  incorporating  thermal  noise  in  the  WRspice  transient  analysis  was  per¬ 
formed  by  M.  Jefferey  in  this  study.  With  IcRn  =  250  pV,  and  in  the  absence  of  parasitics,  it 
appears  that  BER  <  1  O'6  with  T  =  40  K  is  achievable  even  with  the  clock  frequency  as  high  as  1 00 
GHz.  However,  the  BER  is  worsened  by  at  least  one  order  of  magnitude  when  taking  account  of 
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the  parasitics.  A  combination  of  Monte  Carlo  analysis  and  noise  calculation  shows  the  average 
BER  of  the  ideal  T  flip-flop  without  parasitics  at  50  GHz  is  approximately  doubled  when  the  state- 
of-the-art  spreads  are  taken  into  account.  With  these  spreads,  it  is  estimated  the  temperature  needs 
to  be  lowered  to  20-30  K  to  get  BER  <  10'6  [72],  Further  study  is  needed  to  confirm  it. 

The  BER  results  show  the  importance  of  reducing  parasitics.  The  yield  results  show  the  impor¬ 
tance  of  controlling  process  variation.  IcRn  increases  the  circuit  maximum  operation  speed  and  is 
favorable  for  both  the  BER  and  the  yield  at  high  speed.  Improvement  on  all  the  above  three  aspects 
are  needed  to  obtain  more  robust  HTS  digital  circuits. 
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